Elsevier

Clinical Neurophysiology

Volume 118, Issue 12, December 2007, Pages 2544-2590
Clinical Neurophysiology

Invited review
The mismatch negativity (MMN) in basic research of central auditory processing: A review

https://doi.org/10.1016/j.clinph.2007.04.026Get rights and content

Abstract

In the present article, the basic research using the mismatch negativity (MMN) and analogous results obtained by using the magnetoencephalography (MEG) and other brain-imaging technologies is reviewed. This response is elicited by any discriminable change in auditory stimulation but recent studies extended the notion of the MMN even to higher-order cognitive processes such as those involving grammar and semantic meaning. Moreover, MMN data also show the presence of automatic intelligent processes such as stimulus anticipation at the level of auditory cortex. In addition, the MMN enables one to establish the brain processes underlying the initiation of attention switch to, conscious perception of, sound change in an unattended stimulus stream.

Keywords

Mismatch negativity (MMN)
Event-related potential (ERP)
Central auditory processing
Auditory cortex

1. The mismatch negativity (MMN): an introduction

In the present article, we will review the recent literature on the auditory mismatch negativity (MMN; Näätänen et al., 1978, Näätänen, 1979, Näätänen and Michie, 1979), a change-specific component of the auditory event-related brain potential (ERP). We will focus on MMN studies of basic cognitive brain research.

Recently, MMN studies of the central auditory function have become very popular. This is because the MMN has opened an unprecedented window to the central auditory processing and the underlying neurophysiology, affected in a large number of different clinical conditions. The MMN enables one to reach a new level of understanding of the brain processes forming the biological substrate of central auditory perception, the different forms of auditory memory, as well as the attentional processes controlling for the access of auditory sensory input to conscious perception and higher forms of memory. In addition to its magnetoencephalographic (MEG) equivalent MMNm (Hari et al., 1984, Csépe et al., 1992, Levänen et al., 1993, Levänen et al., 1996), the MMN also has its optical imaging (OI) (Rinne et al., 1999b, Tse et al., 2006), functional magnetic resonance imaging (fMRI) (Celsis et al., 1999, Opitz et al., 2002, Molholm et al., 2005), positron emission tomography (PET) (Tervaniemi et al., 2000a, Dittmann-Balcar et al., 2001, Müller et al., 2002), and intracranial (Kropotov et al., 1995, Kropotov et al., 2000, Halgren et al., 1995, Baudena et al., 1995, Liasis et al., 1999, Liasis et al., 2000a, Liasis et al., 2000b, Liasis et al., 2001) equivalents.

The “traditional” MMN is generated by the brain’s automatic response to any change in auditory stimulation exceeding a certain limit roughly corresponding to the behavioural discrimination threshold. The MMN is illustrated in Fig. 1. An analogous response occurs in the other sensory modalities, too; in the somatosensory modality (Kekoni et al., 1997, Shinozaki et al., 1998, Akatsuka et al., 2005, Akatsuka et al., 2007), in the olfactory modality (Krauel et al., 1999; for a review, see Pause and Krauel, 2000), and in the visual modality (Alho et al., 1992, Tales et al., 1999, Heslenfeld, 2003, Maekawa et al., 2005, Fu et al., 2003, Czigler et al., 2002, Czigler et al., 2004, Pazo-Alvarez et al., 2003, Stagg et al., 2004, Astikainen et al., 2004; see, however, Kenemans et al., 2003, Kimura et al., 2005, and Nyman et al., 1990; for reviews, see Pazo-Alvarez et al., 2003, Maekawa et al., 2005, Czigler, in press).

  1. Download : Download high-res image (228KB)
  2. Download : Download full-size image

Fig. 1. (Left) Frontal (Fz) event-related potentials (ERPs) (averaged across subjects) to randomized 1000 Hz standard (80%, black line) and to deviant (20%, green line) stimuli of different frequencies (as indicated on the left side). (Right) The difference waves obtained by subtracting the standard stimulus ERP from that of the deviant stimulus for the different deviant stimuli. Subjects were reading a book. Adapted, with permission, from Sams et al. (1985a).

Moreover, the MMN is also elicited by different kinds of abstract changes in auditory stimulation such as grammar violations in mother-tongue sentences – these higher-order MMNs will be reviewed in later sections of this article.

The MMN response is seen as a negative displacement in particular at the frontocentral and central scalp electrodes (relative to a mastoid or nose reference electrode) in the difference wave obtained by subtracting the event-related potential (ERP) to frequent, “standard”, stimuli from that to deviant stimuli. Here one has to take into account the possible differences in the obligatory ERPs between standards and deviants, however. These differences may result from physical stimulus differences between standards and deviants and from differences in the refractoriness of the neural populations activated by the two stimuli because of the probability difference (see Walker et al., 2001). These differences in the obligatory components are, in general, rather small in amplitude and mainly involve the N1 time zone, however; therefore post-N1 measurements of the MMN usually provide quite reliable estimates of the “genuine” MMN. In addition, the MMN usually reverses polarity in nose-referenced mastoid recordings. See also Deacon et al. (2000).

The MMN usually peaks at 150–250 ms from change onset, with this peak latency getting shorter with the increasing magnitude of stimulus change (Sams et al., 1985a, Tiitinen et al., 1994, Näätänen et al., 1989a, Näätänen et al., 1989b, Amenedo and Escera, 2000). A prerequisite of MMN elicitation is that the central auditory system has, before the occurrence of the deviant stimulus, been able to form a representation of the repetitive aspects of auditory stimulation (Winkler et al., 1996a, Winkler et al., 1996b, Horváth et al., 2001; see also Winkler et al., 1999a, Winkler et al., 1999b, Huotilainen et al., 1993, Paavilainen et al., 1993a; for a review, see Näätänen and Winkler, 1999). An MMN is then elicited by a stimulus that violates this representation. The majority of studies used simple paradigms in which frequent and infrequent stimuli (e.g., tones of 1000 and 1100 Hz, respectively) were presented in a random order, with the infrequent sound eliciting an MMN (Näätänen et al., 1978, Sams et al., 1985a). The MMN can, however, also be elicited by changes in complex stimuli such as speech sounds (Dehaene-Lambertz, 1997, Näätänen et al., 1997) and even by stimuli that deviate from an abstract rule followed by the ongoing auditory stimulation such as a tone repetition in a sequence of descending tones (Tervaniemi et al., 1994a); i.e., even when there is no acoustically constant standard stimulus (see Fig. 2).

  1. Download : Download full-size image

Fig. 2. (a) Spectrum of an individual Shepard sound that, when presented in ascending or descending sequences of 12 sounds in one semitone step, causes a pitch to ascend or descend in an endless manner. One Shepard sound consists of 10 frequency components, of one octave apart, with a bell-shaped spectrum. While a 12-tone series of Shepard sounds is delivered, the tone-height (which is equivalent to the sense of octave) perception is made to disappear by manipulating the sound spectrum. (b) A visual analogy of the Shepard illusion, the endlessly ascending or descending stairs. (c) The ERPs recorded at the frontal (Fz) elecrode from reading subjects to Shepard (left) and sinusoidal (right) tones (thin line: standard stimulus; thick line: deviant stimulus). The left column shows a regularly descending Shepard sound sequence randomly replaced by a repeating (top) or an ascending (bottom) tone (deviant). The right column: regularly descending sinusoidal tone sequence with occasional repetitive (top) or ascending (bottom) deviants. The arrow indicates the deviant-stimulus onset and the shadowed area the statistically significant part of the mismatch negativity. Adapted, with permission, from Tervaniemi et al. (1994a).

Very importantly, in particular in view of the clinical and other potential applications, the MMN is elicited irrespective of the subject or patient’s direction of attention (Näätänen, 1979, Näätänen, 1985, Näätänen et al., 1978). Hence, no behavioural task is needed; in fact, such tasks are used to direct the subject’s attention away from the MMN-eliciting stimulus sequence in order to prevent the elicitation of attention-dependent ERP components (e.g., the N2b; Renault and Lesévre, 1978, Renault and Lesévre, 1979, Näätänen et al., 1982, Sams et al., 1985a, Sams et al., 1990, Novak et al., 1990, Novak et al., 1992; for a review, see Näätänen and Gaillard, 1983) overlapping the MMN.

The MMN gets a contribution from at least two intracranial processes: (1) a bilateral supratemporal process generating the supratemporal MMN subcomponent (and the polarity-reversed “MMN” in nose-referenced mastoid recordings), and (2) a predominantly right-hemispheric frontal process, generating the frontal MMN subcomponent (Näätänen et al., 1978, Giard et al., 1990, Baldeweg et al., 1999, Rinne et al., 2000). The supratemporal component is, presumably, associated with pre-perceptual change detection, whereas the frontal component appears to be related to involuntary attention switch caused by auditory change (Näätänen et al., 1978, Näätänen and Michie, 1979, Giard et al., 1990, Rinne et al., 2000, Escera et al., 1998, Schröger, 1996a, Schröger, 1996b). The MMN generators reflect the nature of the stimulus, e.g., they usually are left lateralized for language stimuli (Näätänen et al., 1997, Shtyrov et al., 2005, Pulvermüller et al., 2003).

2. Memory dependence of MMN elicitation

The MMN depends on the presence of a short-term memory trace in the auditory cortex representing the repetitive aspects of the preceding auditory events, which usually lasts for a few seconds. Therefore, single sounds, even deviants, with no preceding sounds during the last few seconds elicit no MMN but rather enhanced obligatory responses P1, N1, and P2 (Näätänen and Picton, 1987, Korzyukov et al., 1999). Hence, the MMN seen in the difference wave delineated by subtracting the standard-stimulus ERP from that to the deviant stimulus is not simply due to a larger exogenous response to an infrequent than frequent stimulus (for a review, see Näätänen et al., 2005). The MMN is rather composed of, at least mainly, the outcome of a discrimination process where the deviant event is found to be discongruent with the memory representation of the preceding stimuli (even in the absence of attention).

However, as already mentioned, fresh afferent neural populations activated by deviant stimuli such as those of a clearly different frequency often contribute to the difference-wave negativity, usually to its early part (the N1 time zone Näätänen et al., 1988). Moreover, standards and deviants might elicit different exogeneous responses even when each is presented alone in a stimulus block (e.g., tones of very different frequency or duration; for a review, see Näätänen and Picton, 1987). For this reason, the difference wave is often formed by subtracting the ERP to a given stimulus when it is presented as a standard in one stimulus block from that elicited by the same stimulus when it is presented as a deviant in another block (e.g., Deacon et al., 2000).

Furthermore, a controlled protocol, introduced by Schröger and Wolff (1996), can be used for disentangling the N1 and ‘genuine’ MMN components from one another. It allows one to separate the N1 and MMN mechanisms, for instance by controlling for differential states of refractoriness of the feature-specific afferent neurons in the MMN to frequency changes (Jacobsen and Schröger, 2001). Similar results were obtained for MMNs to changes in location, duration, and intensity (Schröger and Wolff, 1996, Jacobsen and Schröger, 2003, Jacobsen et al., 2003a, see also Jacobsen et al., 2003b). These studies yield strong evidence for the memory-based comparison account of the MMN.

The memory-trace interpretation is supported, among other things, also by results as follows (for a review, see Näätänen et al., 2005):

These findings belong to the central evidence for the cognitive or memory interpretation of the MMN: it is elicited when the sensory input does not “match with” the representation of the preceding standard stimuli (see also Laufer and Pratt, 2005). Thus, in recording the MMN to deviant stimuli, one appears to be probing the neural representation of the standard stimulus, with the MMN response reflecting the detection of a difference between the deviant and standard auditory events and hence providing an objective index for auditory discrimination accuracy. Therefore, the MMN also provides, indirectly, a measure of the accuracy of the neural representation of the standard stimulus (see Näätänen and Alho, 1997). Consequently, the MMN opens a unique window to the perceptual and memory functions of the auditory cortex.

The sensory information carried by the sensory-memory traces underlying the MMN generation indeed corresponds to sound perception and memory (and thus provides the central sound representation, CSR), rather than just to the acoustic elements composing the stimulus (Näätänen and Winkler, 1999). This was demonstrated, e.g., by Winkler et al. (1995) who found an MMN to a change of the missing fundamental, with both the standards and deviants being composed of varying combinations of the same simple tonal elements.

The results of several studies adding a backward-masking stimulus to the oddball paradigm (Winkler et al., 1992, Winkler et al., 1993, Bazana and Stelmack, 2002) also suggested that the MMN reflects sensory (echoic) memory in audition (for reviews of different forms of auditory sensory memory, see Cowan, 1988; see also Winkler and Cowan, 2004), with the traces underlying this memory being probed by presenting deviant stimuli (for a review, see Näätänen and Winkler, 1999).

As already mentioned, traces of a relatively short duration are involved in MMN generation (see, however, Cowan et al., 1993, Atienza et al., 2001, Atienza et al., 2004, Jääskeläinen et al., 1999a, Winkler and Cowan, 2004, Winkler et al., 1996a, Winkler et al., 1996b). In young, healthy subjects, these traces can last for about 5–10 s, judging from the ISIs that still permit MMN elicitation (Böttcher-Gandor and Ullsperger, 1992, Näätänen et al., 1987a, Mäntysalo and Näätänen, 1987, Sams et al., 1993; see also Ritter et al., 2002), but the trace duration gets shorter with aging (Jääskeläinen et al., 1999b, Pekkonen et al., 1996). Furthermore, this age-dependent shortening of the memory trace is expedited by chronic alcoholism (Polo et al., 1999, Grau et al., 2001), and these traces are very short in duration in patients with neurodegenerative brain diseases such as Alzheimer’s disease (Pekkonen et al., 1994, Pekkonen, 2000). Grau et al. (1998) developed a new paradigm to determine the duration of these traces in a much shorter time than is usually needed.

In the beginning of a stimulus block, the standard stimulus has to be repeated for a few times before a deviant stimulus can elicit an MMN (Cowan et al., 1993, Winkler et al., 1996a, Winkler et al., 1996b). In addition, the MMN amplitude gets larger with the increasing number of standards preceding a deviant (Sams et al., 1983, Imada et al., 1993a, Imada et al., 1993b, Baldeweg et al., 2004, Haenschel et al., 2005, Javitt et al., 1998). The MMN elicitation early in a block may, however, be facilitated by the standard stimulus of the block being identical to that of the preceding block (Cowan et al., 1993, Winkler et al., 1996a, Winkler et al., 1996b). This suggests that the sensory-memory traces probed with the MMN in the traditional passive oddball paradigm might yield a conservative estimate of the duration of the short-term sensory memory trace in audition (see also Winkler and Cowan, 2004, Winkler, Kujala et al., 2003a). Moreover, subsequent studies showed that the MMN can also be used for probing long-term or permanent memory traces in audition such as those serving speech perception or the recognition of familiar voices, as will be reviewed later in this article.

In addition, the rate at which standards are presented is of course of importance to trace formation and maintenance. With shorter ISIs between the standards, the MMN amplitude tends to get larger (Näätänen et al., 1987a, Sabri and Campbell, 2001, Javitt et al., 1998; see also Alain et al., 1994). For corroborating fMRI data, see Sabri et al. (2004).

3. Deviant-stimulus probability

The MMN amplitude is decreased by increasing the deviant-stimulus probability (Näätänen et al., 1987a, Ritter et al., 1992, Sabri and Campbell, 2001, Haenschel et al., 2005, Sato et al., 2000, Sato et al., 2002). This is partially due to the standard-stimulus being more often replaced by deviant stimuli (and cannot then contribute to trace strength) than they are with smaller deviant-stimulus probabilities. A more important factor, however, appears to be the fact that with shorter deviant-stimulus intervals, these stimuli develop a trace of their own, which in turn might inhibit MMN generation with regard to the initial standard (Sams et al., 1984, Ritter et al., 1992, Rosburg, 2004, Rosburg et al., 2004a, Rosburg et al., 2004b). In fact, an early account of the MMN (Näätänen, 1984) explained the MMN phenomenon in terms of the input from the eliciting deviant stimulus “starting” to develop the representation of its own in the auditory sensory-memory system already engaged by the representation of the preceding stimuli (cf. Donchin’s influential context-updating hypothesis of the P3 (Sutton et al., 1965); Donchin, 1981, Karis et al., 1984, Donchin and Coles, 1988, Donchin and Coles, 1991; see also Javitt et al., 1996, Winkler et al., 1996a, Winkler et al., 1996b, Sussman and Winkler, 2001). Subsequently, three different studies (Sato et al., 2000, Sato et al., 2002, Haenschel et al., 2005) found that the frontally recorded MMN is more sensitive to probability manipulation than is the auditory-cortex source (recorded from the mastoid with nose reference; see a later section).

When two deviants happen to occur in a row, then the MMN to the second deviant is smaller in amplitude than that to the first deviant (Sams et al., 1984, Müller et al., 2005b).The reduction of MMN amplitude is markedly smaller, if the second of two successive deviants deviates from standards in a different attribute than the first deviant (Müller et al., 2005b, Nousak et al., 1996).

3.1. Separation of the MMN from the other components

As already concluded, the MMN cannot be accounted for by the enhancement of the exogenous N1 even though the MMN usually overlaps, at least partially, the N1 when the magnitude of stimulus change exceeds a certain limit (Tiitinen et al., 1994; for a review, see Näätänen et al., 2005). As already mentioned, MEG recordings indicate that the supratemporal source of the MMN can be separated from the N1 source. These studies will be reviewed later. Moreover, robust MMNs (or analogous slow positive waves; Ruusuvirta et al., 2003, Winkler, Kujala et al., 2003a, Maurer et al., 2003a, Maurer et al., 2003b) to various kinds of auditory change can be recorded even in newborns in whom no well-defined N1-type of response is elicited (Alho et al., 1990, Cheour et al., 2002a, Cheour et al., 2002b, Cheour et al., 2002c, Pang and Taylor, 2000). In addition, almost normal-size MMNs to a change in a complex spectrotemporal stimulus, reflecting the effect of prior discrimination training, were present 3 days later in REM sleep in which the N1 was almost completely abolished (Atienza and Cantero, 2001). Moreover, drug effects on the N1 and MMN amplitudes can be very different (e.g., Javitt et al., 1996). In addition, certain lesions may eliminate the MMN but leave the N1 intact (e.g., Alho et al., 1994b). Also, subjects can be trained to perform an initially very difficult discrimination, which is accompanied by the emergence and growth of the MMN with no corresponding effect on the N1 (Atienza et al., 2002b, Näätänen et al., 1993b).

When the sequence of standard and deviant sounds is attended, then the MMN is usually elicited quite similarly as when the subject’s attention is directed elsewhere (even though under some conditions with highly focused attention elsewhere, its amplitude may, as will be discussed later, be somewhat attenuated). However, when a sound stream is attended, then the MMN elicited by deviants in this stream is at least partially overlapped by the N2b (Näätänen et al., 1982; for a review, see Näätänen and Gaillard, 1983), with a scalp distribution somewhat posterior to that of the MMN (Aulanko et al., 1993). Furthermore, unlike the MMN, which usually shows a polarity reversal when recorded from mastoid electrodes with nose reference, the N2b shows no such polarity reversal. Hence, the mastoid recordings can provide an estimate of the supratemporal component of the MMN without N2b contamination.

The processing negativity (PN) elicited by an attended stimulus stream in the presence of a concurrent irrelevant stimulus stream (Näätänen et al., 1978, Alho et al., 1987, Näätänen, 1988, Näätänen, 1990; see also Hillyard et al., 1973, Hansen and Hillyard, 1983; for an MEG equivalent of the PN, see Hari et al., 1989b) may also overlap the MMN in response to deviants in the attended stimulus stream. The PN is, however, cancelled in the difference waves obtained by subtracting the ERP to standards from that to deviants within the attended channel (Alho et al., 1987). [Hansen and Hillyard’s (1983) “Nd” refers to the PN difference between ERPs to attended and unattended stimulus streams; see Alho et al., 1987.]

3.2. MMN generators

The scalp-recorded MMN has its largest amplitude over the fronto-central scalp areas. The modeling of the generator sources of the MMN with equivalent current dipoles (ECD) suggests that the fronto-centrally predominant scalp distribution of the MMN is mainly explained by the sum of the activity bilaterally generated in the supratemporal cortices (Giard et al., 1995, Rinne et al., 1999b, Scherg et al., 1989, Jemel et al., 2002). This interpretation is supported by MMNm recordings (Hari et al., 1984, Levänen et al., 1996, Alho et al., 1998a, Alho et al., 1998b, Sams et al., 1991b, Csépe et al., 1992) which show signal maxima over the bilateral supratemporal cortices.

Moreover, intracranial MMN recordings in the mouse (Umbricht et al., 2005), guinea pig (Kraus et al., 1994a, Kraus et al., 1994b, King et al., 1995), cat (Csépe et al., 1987, Csépe et al., 1988, Csépe et al., 1989, Csépe, 1995, Pincze et al., 2001, Pincze et al., 2002), rat (Ruusuvirta et al., 1998, Astikainen et al., 2006; see, however, Eriksson and Villa, 2005, Lazar and Metherate, 2003), rabbit (Astikainen et al., 2000), monkey (Javitt et al., 1992, Javitt et al., 1996), and humans (Halgren et al., 1995, Halgren et al., 1998, Baudena et al., 1995, Kropotov et al., 1995, Kropotov et al., 2000, Liasis et al., 1999, Liasis et al., 2000a, Liasis et al., 2001, Rosburg et al., 2005) indicated MMN generation in the auditory cortex. These results also showed that the MMN generation locus can be separated from those of the N1 and the other afferent responses. For example, in humans, the MMNm to frequency change is generated in the supratemporal cortex 3–10 mm anteriorly to the N1m source (Sams et al., 1991b, Csépe et al., 1992, Hari et al., 1992, Huotilainen et al., 1993, Tiitinen et al., 1993, Levänen et al., 1993, Levänen et al., 1996, Liasis et al., 2000a, Alho et al., 1998a, Alho et al., 1998b, Korzyukov et al., 1999, Rosburg, 2003, Rosburg et al., 2004a; see also Rosburg et al., 2004b). A similar difference was also observed in the source localization of the electrically recorded MMN and N1 (Scherg et al., 1989). Optical recordings (Rinne et al., 1999b), too, show a clear separation between the N1 and MMN generators.

In addition, intracranial recordings in humans (Kropotov et al., 1995, Kropotov et al., 2000, Rosburg, 2003) indicate that the MMN to deviant sounds among standard sounds and the enhanced N1 response to infrequent sounds occurring with no intervening standard sounds are generated by different neuronal populations. Kropotov et al. (2000) found that one area of the temporal cortex (Area, 22) gave a differential response to deviants (1300-Hz tone) among standards (1000-Hz tone) in the MMN latency range, but did not respond differentially to these tones when each of them was presented alone in a sequence, and did not respond differentially to the 1000-Hz standard tone when it was (alone) presented at a slow (deviant-stimulus) rate or at a fast (standard-stimulus) rate. In contrast, frequency-dependent responses were recorded from Area 41 and responses to rarity per se (i.e., a larger response to deviants alone at their typical long intervals than the response to these stimuli alone at short intervals) from Area 42 (cf. Ulanovsky et al., 2003). Hence, the existence of genuine change-dependent responses could be verified and their cortical origin separated from that of the cortical reception of the afferent input that gives rise to the frequency- and rate-specific cortical afferent responses such as the N1 (N1a; see Näätänen and Picton, 1987). For supporting evidence, see Liasis et al., 1999, Liasis et al., 2000a, Liasis et al., 2000b.

Intracranial animal data also support the separability of the MMN from the afferent responses. On the basis of their cat recordings, Pincze et al. (2001) concluded that the MMN was generated in the rostroventral part of the secondary auditory cortex, clearly separated from the P1 and N1 sources. Moreover, Javitt et al. (1996), in the monkey, found that an NMDA-receptor antagonist MK-801 eliminated the MMN kinds of responses to frequency and intensity deviants but left the afferent responses intact.

Furthermore, studies using the fMRI (Celsis et al., 1999, Opitz et al., 1999, Opitz et al., 2002, Opitz et al., 2005, Wible et al., 2001, Mathiak et al., 2002, Liebenthal et al., 2003, Schall et al., 2003, Downar et al., 2000, Doeller et al., 2003, Kircher et al., 2004, Molholm et al., 2005, Sabri et al., 2004, Sevostianov et al., 2002, Rinne et al., 2005), PET (Tervaniemi et al., 2000a, Dittmann-Balcar et al., 2001, Müller et al., 2002), and event-related optical signals (EROS) (Rinne et al., 1999b) also suggested MMN generation in the auditory cortex. In addition, Tse et al.’s (2006) very recent optical recordings of brain activity associated with preattentive changes in to-be-ignored sounds are consistent with this result. Moreover, the MMN was attenuated in amplitude in patients with brain lesions involving the auditory cortex (Aaltonen et al., 1993; Alain et al., 1998, Ilvonen et al., 2001, Ilvonen et al., 2003).

Importantly, there is evidence indicating that at least partially different neural populations in the auditory cortex are activated by different types of auditory change. This was already suggested by Paavilainen et al. (1991) who found different polarity-reversal ratios for the frequency, duration, and intensity MMNs. Subsequently, studies using dipole modeling of the MMN and MMNm sources reported differences in the range of a few millimeters in location and/or differences in orientation between the sources of the MMN responses to intensity, frequency, ISI, or duration changes (Rosburg, 2003, Frodl-Bauch et al., 1997, Giard et al., 1995, Levänen et al., 1993, Levänen et al., 1996; see also Deouell and Bentin, 1998, Deouell et al., 1998, Doeller et al., 2003). Different, attribute-specific sources are also supported by data (e.g., Nousak et al., 1996) showing that the MMN to the second of two consecutive deviants is not attenuated when the two consecutive deviants differ from the standard in different attributes. Furthermore, a recent fMRI study (Molholm et al., 2005) also found that frequency and duration changes activate different areas both in the supratemporal and frontal cortices. Moreover, unlike MMNs to changes in non-phonetic sounds, MMNs to phoneme changes were elicited with a larger amplitude in the auditory cortex of the left hemisphere than in that of the right hemisphere (Näätänen et al., 1997, Alho et al., 1998a, Shestakova et al., 2002b, Tervaniemi et al., 1999, Tervaniemi et al., 2000a, Rinne et al., 1999a). In addition, Alho et al. (1996) found that the MMN scalp distributions for changes in simple and complex sounds differ from each other in the right-hemisphere temporal cortex. (The quality of the left-hemispheric data did not permit similar comparisons.) See also Alain et al., 1999a, Alain et al., 1999b! There is also some evidence for the tonotopical organization of the MMNm generation to frequency change of a pure tone (Tiitinen et al., 1993).

The afore-reviewed results suggest that the MMN might be used to probe the functional sensory-memory organization of the auditory cortex (see also Alain et al., 1998). However, as evidenced by some studies reporting negative findings (i.e., no difference between the sources of different MMNs; e.g., Sams et al., 1991b), the mapping of the different MMN sources in auditory cortex has proven to be rather difficult. This might be partially due to individual differences in the organization of the auditory fields in the supratemporal cortex.

In addition to the bilateral supratemporal cortices, other cortical areas are also involved in MMN generation. A frontal-lobe involvement in MMN generation was already proposed on the basis of their only four-channel scalp-potential recordings by Näätänen et al. (1978). This suggestion (see also Näätänen and Michie, 1979) was supported by later analyses of the MMN scalp-potential distribution (scalp current density analysis, SCD), which indicated an additional MMN source in the frontal lobes (Deouell et al., 1998, Giard et al., 1990; Yago et al., 2001a, Yago et al., 2001b). Frontal MMN sources were also suggested by studies using source-current modeling (Rinne et al., 2000, Waberski et al., 2001) and multi-dipole modeling (Jemel et al., 2002) techniques. Furthermore, frontal MMN sources were also supported by intracranial ERP (Baudena et al., 1995, Liasis et al., 2001, Rosburg et al., 2005), PET (Dittmann-Balcar et al., 2001, Müller et al., 2002), and fMRI recordings (Celsis et al., 1999, Opitz et al., 2002, Doeller et al., 2003, Schall et al., 2003, Molholm et al., 2005, Rinne et al., 2005) as well as by developmental data (Gomot et al., 2000). The functional role of the frontal lobes in the processing of infrequent sound changes and in MMN generation remains poorly understood, however (see also Restuccia et al., 2005).

Some animal studies (Kraus et al., 1994a, Kraus et al., 1994b, Csépe et al., 1989; Ruusuvirta et al., 1995, Astikainen et al., 2005) also indicated responses to infrequent sound changes in the thalamus and hippocampus. It is not clear, however, whether these responses are genuinely related to sound-change processing or are caused by the rarity of the infrequent sound, like the enhanced N1 to infrequent sounds delivered with no intervening standard sounds. The lack of the effects of hippocampal lesions on the MMN in humans (Alain et al., 1989) supports the latter alternative. In contrast, thalamic lesions attenuated the MMN amplitude (Mäkelä et al., 1998), suggesting the involvement of the thalamus in the generation of genuine deviance-detection activity. In addition, a right parietal-lobe contribution to auditory change detection was suggested by several studies (Kasai et al., 1999, Levänen et al., 1996; Lavikainen et al., 1995, Schall et al., 2003, Molholm et al., 2005). It is possible that the parietal activation, which seems to occur clearly later than the temporal one, is in fact related to other processes than the MMN, such as the generation of the P3a, however.

3.3. Perceptual streaming and stream segregation as indexed by the MMN

In experimental conditions with several frequent stimuli, MMN elicitation may show modularity. This is the case when stimuli form separate perceptual streams (Bregman, 1990), e.g., when two concurrent sound sequences are dichotically presented at a rapid rate. Hence, frequency or other deviants in each stream elicit an MMN relative to the standard of the same ear only, there being no, or only very little, across-ear “cross-talk” (Praamstra and Stegeman, 1992, Ritter et al., 2000, McKenzie and Barry, 2006). This means that there can be two or several parallel memory traces with separate MMN elicitation in the same stimulus block.

Moreover, it is also possible to create three parallel streams with separate MMN elicitation. Nager et al. (2003b) produced, in different conditions, either, one, two, or three auditory streams defined by spatial position (different loudspeakers) and frequency. Nevertheless, an MMN was elicited by deviants, shorter-duration tones with a slightly higher pitch than that of the other tones of the respective stream. This MMN was, however, of a lower amplitude in the 3-streams condition than those in the 1-stream and 2-streams conditions. (Control experiments ruled out the possibility that this attenuation was due to differences in the stimulus rate.) This effect was interpreted as reflecting the capacity limits of auditory sensory memory, resulting in less accurate traces of each stream as the number of the concurrent stimulus streams is increased.

Separate streams can also be created within the same locus of sound origin by introducing a major frequency difference between the two concurrent stimulus sequences, with each being presented at a relatively short ISI (Sussman et al., 1999, Yabe et al., 2001a, Yabe et al., 2001b, Winkler et al., 1992; Shinozaki et al., 2000). See also Winkler et al., 1996a, Ritter et al., 1995, Gomes et al., 1997, Fujioka et al., 2004, and Brattico et al., 2002a, Brattico et al., 2002b. Moreover, a moving sound source (the consecutive steps of a walking person) also forms a sound stream, a deviation in which elicits an MMN even in the presence of continuous loud environmental noise (Winkler et al., 2003c).

The studies reviewed in the afore-going (see also Müller et al., 2005b) convincingly demonstrated that stream segregation precedes, and provides a prerequisite for, MMN elicitation (and thus, apparently, also for selective attention in audition; see Näätänen, 1990). Hence, the MMN is closely associated with the way the central auditory system organizes the incoming sounds: the sound organization determines the regularity on which MMN elicitation is based (see also Sussman et al., 1998a, Winkler et al., 2001, Alain et al., 2001, Alain et al., 2002 for reviews, see Alain et al., 1994; Näätänen et al., 2001, Näätänen and Winkler, 1999). These characteristics of the MMN, together with the relative attention independence, make it ideal for testing whether a certain phenomenon of auditory perception depends on attention. Such an approach has been applied to the octave illusion (Ross et al., 1996), continuity illusion (Micheyl et al., 2003), and the illusory conjunction of different stimulus features (Takegata et al., 2005a, Winkler et al., 2005).

4. Perceptual integration in audition as reflected by the MMN

4.1. Feature integration

Consequently, the traces involved in MMN elicitation seem to reflect feature-integrated sensory information underlying unitary auditory percepts (for a review, see Näätänen and Winkler, 1999). This was also shown by Gomes et al. (1997) who obtained an MMN by infrequently presenting a stimulus produced by conjugating features from two separate frequent stimuli. Subjects were presented with series of 4 different tones. Three of the tones (standards) were each delivered at a 30% probability, each differing from the others in both frequency and intensity. The deviant tone (probability of 10%) matched the frequency of one standard and the intensity of another standard. Since each feature of the deviant tone was also present in one of the standards, the MMN elicited by the deviant stimulus could be due to its infrequent feature conjunction only. (See also Nousak et al., 1996, Deacon et al., 1998.)

Such feature-integrated representations in audition are formed without focal attention to stimuli. Winkler et al. (2005) found that MMNs to rare combinations of acoustic features were quite similar irrespective of the modality and direction of focal attention. This result challenges the extension to the auditory modality of Treisman, 1988, Treisman and Gelade, 1980, Treisman and Sato, 1990 influential feature-integration theory which holds, based on visual studies, that the correct conjoining of features needs focal attention. However, the afore-reviewed auditory MMN studies on feature integration used sequentially presented sounds only, whereas the visual studies on feature integration used parallel presentation. Recently, Takegata et al. (2005a), however, found that the MMN responses to rare conjunctions of auditory features did not depend on attention even when an array of spatially distributed concurrent sounds, an analog to the stimulus display in visual studies, was used.

The MMN has also been used to test the attentional dependence of the illusory conjunction of features in audition (Takegata et al., 2005a; Winkler et al., 2005).

Regularities at different levels (such as those involving acoustic features, conjunctions of features, and sequential patterns of sounds) appear to be represented in parallel. Takegata et al. (1999) found the parallel elicitation of MMNs to changes in a single acoustic feature and to those in the conjunction of features. This result was supported by their MEG data (Takegata et al., 2001a, Takegata et al., 2001b, Takegata et al., 2001c) showing that MMNm elicitation for two simple acoustic features (frequency and location) and that for the conjunction of these features activated at least partially distinct neural populations. Consistent with this, Alain et al., 1999a, Alain et al., 1999b, Alho et al., 1996 found that different neuronal populations appeared to be involved in the MMN generation for a simple acoustic feature (frequency) and in that for a sequential pattern formed by tones of different frequencies.

Thus, memory traces underlying MMN elicitation range from the level of the static acoustic features to that of the conjunction of these features into unitary sounds and to higher-order spatiotemporal patterns. Näätänen and Winkler (1999) distinguished two stages, or levels, of output in central auditory processing: (a) sensory feature traces that correspond to the separate features of the stimulus; and (b) the unitary stimulus representation of the full auditory event. Depending on the structure and parameters of the stimulus sequence, an MMN elicited by a certain acoustic change can reflect any, or even the combination, of memory traces at different levels (Ritter et al., 1995; Schröger, 1997).

4.2. Stimulus grouping

When sounds are presented at a fast rate, stimulus grouping on the basis of the temporal or spatial proximity or of some repetitive pattern of stimulation may occur. Scherg et al. (1989), using a stimulus-onset-asynchrony (SOA) of 0.9 s, found no difference in the MMN amplitude for a frequency change between regular (deviant, D, every fifth stimulus) and randomized stimulus delivery (D occurring at p = .20). However, Sussman et al. (1998b), replicating the Scherg et al. (1989) result with an SOA of 1.3 s, found that this frequency MMN in the regular condition disappeared when the SOA was made very short (100 ms). The authors concluded that the fast stimulus rate enabled the central auditory system to group all 5 stimuli (SSSSD, with S denoting the standard) together as a unit which effectively formed the regular, standard, stimulus event. That is, with this short ISI, several of these 5-stimulus sequences could be simultaneously represented in auditory sensory memory. Therefore they were represented as the standard event in this memory, i.e., frequency deviants, in fact, formed a part of the 5-tones standard. [A control block showed that an MMN was elicited when the short-SOA (100 ms) stimulus block was presented with a randomized stimulus order]. With the 1.3 s SOA, the stimulus rate may have been too slow. See also Alain et al., 1994, Alain et al., 1999a, Alain et al., 1999b and Alain and Izenberg (2003).

Recently, Sussman (2005) demonstrated that in the presence of multiple stimulus streams, the grouping and integration processes to perceptual units, as judged from the MMN data, take place modularly within the already formed streams. This result implies, for example, that speech intelligibility in the presence of two or several speakers is supported by automatic within-channel grouping and, further, that such integration processes distinguish speech units separately within each stimulus stream (see later). (See also Fujioka et al., 2005.) For further MMN studies on the temporal grouping of auditory stimuli, see Müller and Schröger, 2007, Müller et al., 2005a, Müller et al., 2005b, Takegata et al., 2005b.

4.3. Audio–visual integration

The MMN can also be used to probe the neural processes underlying the auditory–visual integration, in particular, whether the activity in the auditory cortex is modulated by visual stimuli. Sams et al. (1991a), in a pioneering study, compared the MEG responses to rare, incongruent audio–visual stimuli (acoustic /pa/ simultaneously with visual /ka/), perceived as a different syllable (/ta/; the McGurk effect; McGurk and MacDonald, 1976), with those to congruent, frequent stimuli (acoustic /pa/ paired with visual /pa/). Even though the acoustic parts of the stimuli were identical, the rare stimuli nevertheless elicited an MMNm with a clear activation in the auditory cortex. (see also Colin et al., 2002a, Colin et al., 2004a, Colin et al., 2004b, Möttönen et al., 2002, Ullsperger et al., 2006; see, however, Besle et al., 2005). In addition, Colin et al., 2002b, Stekelenburg et al., 2004 found that an MMN accompanied an illusory shift in sound location (the ventriloquist illusion). For possible neural mechanisms of multisensory convergence during the early cortical processing, see Foxe and Schroeder, 2005, Besle et al., 2005, Molholm et al., 2002, Giard and Peronnet, 1999, Bernstein et al., 2002.

4.4. Temporal integration

Näätänen and Winkler (1999) proposed that the central sound representation (CSR) is formed when the outcomes of the different parallel feature-specific processes cumulate on the neural mechanisms of auditory sensory memory. The neural sensory-memory trace then emerging (with this phase underlying perception) contains the highly stimulus-specific feature-integrated sensory information that is present in perception and sensory memory. This integration process uses, presumably, a sliding temporal window of some 150–200 ms in duration, i.e., the temporal window of integration (TWI; Näätänen, 1990). The emerging central sound representation, the CSR, usually enters conscious perception but may also remain subjectively silent (Näätänen, 1992). During the TWI, acoustic stimulation from the same source or channel (i.e., of similar acoustic parameters and approximately the same spatial origin; see Shinozaki et al., 2003) is integrated into a unitary auditory percept.

The estimate of the TWI duration is based on both behavioural and MMN data. Behavioural studies showed that loudness summation for brief sounds continues up to durations of about 200 ms (Moore, 1989, Scharf and Houtsma, 1986; Zwicker and Fastl, 1990). Furthermore, backward recognition-masking studies (Cowan, 1984, Foyle and Watson, 1984, Hawkins and Presson, 1977, Hawkins and Presson, 1986, Massaro, 1970) found that when the mask followed the onset of the brief test stimulus, to be identified by the subject, with an interval shorter than 150–200 ms, then the stimulus features were incompletely perceived. Hence, it appeared that about 150–200 ms from stimulus onset is needed for the completion of the trace development, and thus for a fully elaborated percept to emerge. For supporting MMN data from the backward-masking paradigm, see Winkler and Näätänen, 1992, Winkler and Näätänen, 1994 and Winkler et al. (1993).

Furthermore, an infrequent stimulus omission elicits an MMN only when the constant SOA is shorter than 150 ms (Yabe et al., 1997, Yabe et al., 1998; see also Tervaniemi et al., 1994b). Yabe et al. (1997) suggested that auditory input is processed in circa 150–170-ms temporal segments (the TWI); therefore stimulus omission from this time segment initiated by the preceding stimulus elicited an MMN, whereas stimulus omission occurring thereafter did not (i.e., with SOAs >150 ms). Corroborating evidence was provided by Yabe et al., 2001a, Yabe et al., 2001b, Yabe et al., 2005a, Yabe et al., 2005b, Wang et al., 2005a.

Schröger (1997) developed an ingenious way to estimate the duration of the trace-formation process. They presented a location deviant as the last stimulus of a train of 4 or 11 stimuli of 30 ms in duration. When the silent ISI between the consecutive stimuli was 10 msec, then no MMN was elicited by the last stimulus of a train of 4 stimuli, whereas it was elicited when the ISI was 170 ms. Furthermore, the MMN was elicited by the last stimulus in the 11-stimulus trains irrespective of the ISI. Thus, the time needed for trace formation, i.e., the TWI duration, seems to be longer than 120 ms but shorter than 400 ms.

The TWI may also be crucial for speech perception and musical experience. Both depend on the simultaneous psychological presence of auditory stimulation from a time window of some short duration rather than from any given moment. Thus, auditory perception does not correspond to the immediately present acoustic reality (corrected with the delay caused by auditory processing) but rather to the outcome of temporal integration over the immediate past of some 150–200 ms (the TWI). Näätänen (1990) proposed that the continuously sliding temporal window of integration actually provides this temporally stretched psychological presence. According to him, the TWI may considerably expand in time “the ‘psychological presence’ in audition relative to the timeless ‘cutting edge’ of physical presence that continuously turns the future into the past” (p. 275).

The TWI was also supported by Winkler et al. (1998) whose standard stimuli started as a sinusoidal tone of a single frequency but terminated as a frequency glide of 50 ms in duration. The deviant differed from the standard in two features: stimulus intensity and the direction of the frequency glide. The deviant elicited two temporally separate MMNs when the sinusoidal part of the complex sound was 250 ms in duration, that is, when the glide (the second deviant feature) started 250 ms after the onset of the first (i.e., intensity) deviance in the deviant sound. However, only a single MMN (corresponding in latency to that elicited by intensity deviance) was elicited by the deviant sound when the pure-tone part of the complex sound was 150 ms, i.e., the onsets of the two deviations in the deviant sound were separated by 150 ms only. This result suggests that two temporally separate deviations from the same standard are treated as a single deviant event if these two deviations occur within the same TWI.

Corroborating evidence using two other sound features, viz., tone frequency and duration, was provided by Czigler and Winkler, 1996, Winkler and Czigler, 1998. Moreover, Sussman et al. (1999) obtained compatible results when the two deviations (one in frequency, the other in intensity) occurred in two separate consecutive short tones.

On the basis of these and other evidence, Näätänen and Winkler (1999) concluded that the sensory-memory trace formation is a fast, automatic process that is completed by about 200 ms from stimulus onset (the TWI) and that underlies (provides the specific sensory-informational contents for) the transient percept of the sound. In this process, both parallel (feature integration using the outputs from the different feature-analyzer systems) and sequential (temporal) integration of sensory information occurs. The authors, however, stressed that although this temporal-integration process sometimes results in information loss, as in detection masking and recognition masking, this integration usually is of constructive nature, structuring or segmenting the auditory perceptual world (Bregman, 1990). See also Sussman et al., 2002a, Sussman et al., 2002b, Sussman, 2005. “It is expected that a smooth (frequency) transition would prevent the backward masking of recognition of segments within the sound” (Cowan, 1984, p. 359). Such transitions form an essential element in natural sounds, such as speech and music. Therefore, according to Näätänen (1995), rather than condensing the time dimension, temporal integration binds together temporally closely spaced events at the perceptual level (e.g., the integration of two closely spaced tones to form a single perceptual event; Csépe et al., 1997, Loveless et al., 1994, Loveless et al., 1996, Tervaniemi et al., 1994b; Winkler and Näätänen, 1994).

Moreover, Yabe’s (2002) and Yabe et al., 2005a, Yabe et al., 2001a, Yabe et al., 2001b, Yabe et al., 2005b, Yabe et al., 1998 recent MMNm studies of the TWI in a significant way furthered our understanding of its nature and properties. In addition, related studies were conducted by Sussman et al., 2002a, Sussman et al., 2002b, Sussman and Winkler, 2001, Sussman, 2005. Furthermore, Wang et al. (2005a) found that the TWI duration, as indexed by the MMN, for young children (5–8 years) is almost twice that in adults but is gradually shortened with maturation. Interestingly, Rüsseler et al. (2001), using the Yabe et al. (1997) stimulus-omission paradigm, obtained MMN results suggesting that musicians, too, have a prolonged TWI duration, one longer than that of non-musicians. One might wonder whether the age-related decrease in the TWI duration is slower in musicians than in others.

Furthermore, Shinozaki et al. (2003), recording the MMNm to infrequent omissions of the second tone in repetitive tone pairs composed of two closely spaced tones of different frequencies, extended the concept of the TWI even to involve the spectral proximity of stimuli. Their spectro-temporal window of integration (STWI) only integrated those stimuli falling on the same TWI that also were rather similar in frequency with each other (see also Yabe et al., 2001b), however accepting larger differences in the early than late phase of the TWI. The authors concluded that this study provides “the first neurophysiological evidence of the two-dimensional (spectrotemporal) window of integration existing in the human brain. Two stimuli presented in close succession in the time-by-frequency surface might be presented in the auditory system as a unitary integrated event. This auditory spectrotemporal integration should play an indispensable role for the neural processing of complex signal sequence such as vocalizations, speech and music” (Shinozaki et al., 2003, p. 570).

5. Automaticity of MMN elicitation

As already mentioned, the MMN generation is an automatic brain process in the sense that its occurrence does not depend on attention (Näätänen et al., 1978, Näätänen and Michie, 1979, Alho et al., 1989, Alho et al., 1994a, Alain et al., 1994; see also Muller-Gass et al., 2005). Hence, the MMN is elicited even when attention is strongly focused on a concurrent auditory stimulus stream. Under such conditions, the MMN amplitude may be somewhat attenuated, however (see Woldorff et al., 1991, Woldorff et al., 1998, Alain and Izenberg, 2003, Alain and Woods, 1994, Alain and Woods, 1997; Näätänen et al., 1993a, Dittmann-Balcar et al., 1999, Arnott and Alain, 2002, Trejo et al., 1995, Muller-Gass et al., 2005, Woods et al., 1992, Woods et al., 1994, Näätänen et al., 1993a, Kathmann et al., 1999, Oades and Dittman-Balcar, 1995, Alho et al., 1992, Otten et al., 2000). Some of these studies (e.g., Arnott and Alain, 2002, Woldorff et al., 1991) even claimed that the MMN was totally abolished by the very strong focusing of attention on another auditory input stream, however, but judging from the data presented, a small MMN residual may nevertheless have remained (for a discussion, see Näätänen, 1991; see also Alain and Izenberg, 2003). There are also a number of studies in which attention to another auditory stimulus stream had no influence on the MMN amplitude (e.g., Alho et al., 1989, Alho et al., 1994a, Sussman et al., 2003b, Näätänen et al., 1978, Näätänen et al., 1982, Kaukoranta et al., 1989, Lounasmaa et al., 1989, Paavilainen et al., 1993b).

It is also possible that attention effects on the MMN amplitude depend on the magnitude of stimulus change, with MMNs to smaller changes being more susceptible to attentional influences than those to larger changes (Müller et al., 2002), and on the attribute of stimulus deviation (Näätänen, 1991, Näätänen et al., 1993a, Muller-Gass et al., 2005). For instance, in Woldorff et al.’s (1991) study, the intensity MMN was much more sensitive to the withdrawal of focused attention than was the frequency MMN (for corroborating data, see Näätänen et al., 1993a). In addition, it appears that the (right-hemisphere) frontal source of the MMN is more sensitive to attentional effects than the temporal source (Restuccia et al., 2005). Furthermore, Näätänen (1991) proposed a division of neurons involved in the MMN process into (1) computational and (2) amplifying ones, suggesting that it is the latter neurons that might be modulated by attention. He also concluded that, in general, the degree of attention independence of the MMN is well sufficient to justify its use as an objective measure of central auditory processing.

When comparing deviance-related difference waves between the attended and unattended stimulus streams, there is the problem of component overlap in the attended channel in which deviants elicit the N2b (Näätänen et al., 1982), too (see Näätänen, 1991), and may also elicit the processing negativity when the temporal probability of the deviant exceeds a certain limit (Näätänen, 1990, Näätänen, 1992). For instance, in Woldorff et al.’s (1991) study, the presence of the N2b is suggested by the scalp distribution of the attentional enhancement. The considerably larger Cz than Fz amplitudes of the difference waves (e.g., their Fig. 4) seem, in particular, to indicate the presence of the N2b component which is generally larger in amplitude centrally than frontally, whereas the MMN is usually larger frontally.

  1. Download : Download high-res image (341KB)
  2. Download : Download full-size image

Fig. 4. A model illustrating attention and automaticity in auditory processing. For explanation, see text! Adapted, with permission, from Näätänen (1990).

Importantly, Ritter et al. (1999) proposed that the MMN source itself might well be attention-independent but nevertheless generates an attenuated MMN because of attentionally reduced inflow in the channel (Woldorff et al., 1987, Woldorff et al., 1991, Woldorff and Hillyard, 1991). In this case, one could predict a constant MMN/N1 amplitude ratio (with the N1 indexing the magnitude of that inflow) rather than a constant MMN amplitude per se across the different attention conditions.

In addition, Sussman et al. (2003c) suggested a biased-competition explanation for these findings: changes occurring in the same feature in two auditory channels would compete for being processed by the MMN system. When the change in one channel is designated as the target in an active discrimination task, then the competition for the feature-specific MMN resource would become biased, that is, changes in the target would be given priority in accessing the MMN system. This would cause the MMN to be attenuated, or even eliminated, in the unattended channel for the target feature. However, as suggested by Sussman et al.’s (2003c) data, the competition is only biased for the target feature, not for other auditory features (i.e., there is no general MMN suppression of the unattended channel).

Several studies also manipulated the visual task load but usually found no attentional modulation of the MMN amplitude (Otten et al., 2000, Alho et al., 1992, Alho et al., 1994a, Dittmann-Balcar et al., 1999, Dyson et al., 2005, Kathmann et al., 1999, Müller et al., 2002, Muller-Gass et al., 2005, Muller-Gass et al., 2006, Harmony et al., 2000, Sussman et al., 2005, Winkler et al., 2005, Takegata et al., 2005a; see, however, Kramer et al., 1995, Yucel et al., 2005a, Yucel et al., 2005b, Woods et al., 1992). Even opposite results have been reported: Zhang et al. (2006) found that with an increasing visual attentional load, the MMN amplitude for a frequency change in irrelevant background stimulation became larger (whereas the subsequent P3a response became smaller, indicating decreased involuntary attention switching with increased visual task load).

Further, attention may affect the MMN indirectly by affecting the regularity maintained in auditory sensory memory. Sussman et al. (2002a) found that the MMN was elicited by a frequency change in a repetitive tone (tone 1) regularly occurring in every fifth tone (tone 2) both when subjects were instructed to attend to the pitch of the stimuli (to detect an occasional, third, tone of still another frequency, i.e., tone 3) and when they ignored stimuli. When subjects were informed about the overall structure of the stimulus sequence, and instructed to respond to pattern violations, then no MMN was elicited by frequency deviants. This suggests that in the Attend-pattern condition, Tone 2 was processed as a member of the standard, which was represented as a repeating tone pattern (i.e., 11112). Thus, the data provide evidence suggesting that top-down effects occur at the level of stimulus encoding, affecting the sensory information used in the MMN process (see also Sussman and Gumenyuk, 2005, Sussman and Steinschneider, 2006, Sussman et al., 1998a, Sussman et al., 1998b, Sussman et al., 2002b, Sussman et al., 2003a, Sussman et al., 2003b). Additionally, the full correspondence found between the perceptual sound organization (assessed by the N2b data and the subjective reports) and the sound organization inferred from the MMN results is consistent with the hypothesis that the sound representations involved in the MMN-generating process also underlie sound perception (Näätänen and Winkler, 1999). Similar effects were also obtained when subjects selectively attended to one of the concurrent auditory streams (Sussman et al., 2003b, 2005).

As for the selective-attention effects on longer-latency ERP components, they are robust in that during strong attentional focus elsewhere, all ERP components subsequent to the MMN in response to deviants in the unattended stream are strongly attenuated or totally abolished such as the P3a and the subsequent negativities (Otten et al., 2000). In these studies, the MMN-generator activation apparently did not succeed in triggering attention switch and further processing in the unattended channel (see also Näätänen, 1992, Berti et al., 2004). Consistent with this, the afore-reviewed Zhang et al. (2006) study (for corroborating results, see Duncan and Keye, 1987, Harmony et al., 2000; Yucel et al., 2005a, Yucel et al., 2005b) showed that the P3a, an index of the occurrence of an attention switch (Squires et al., 1975, Escera et al., 1998, Knight, 1996), is elicited by unattended-stream deviants when the attention-demanding task performed by the subject is easy but is not elicited when it is difficult. This suggests that involuntary attention switch has an elevated threshold when the task is more difficult.

Furthermore, the automaticity of the MMN is strongly supported by, for example, MMN elicitation even in comatose patients [e.g., in Fischer et al., 2004, Fischer et al., 2006 patients, the Glascow Coma Scale score <8], which may occur when widely deviant stimuli are presented to patients whose consciousness will recover later (Kane et al., 1993, Kane et al., 1996; Fischer et al., 1999, Fischer et al., 2004, Fischer et al., 2006, Luauté et al., 2005, Morlet et al., 2000). These data suggest that the MMN can be used as a tool in coma-outcome prediction. Moreover, the MMN can be elicited even in anaesthesia (Csépe et al., 1989, Koelsch et al., 2006, Heinke et al., 2004, Heinke and Koelsch, 2005, Yppärilä et al., 2002), and is also elicited at certain sleep stages in adults (Sallinen et al., 1994, Sallinen et al., 1996, Sallinen and Lyytinen, 1997, Campbell and Colrain, 2002, Campbell et al., 1988, Campbell et al., 1991, Atienza et al., 1997, Atienza et al., 2000, Atienza et al., 2001, Atienza et al., 2002a; Atienza and Cantero, 2001, Nielsen-Bohlman et al., 1991, Nashida et al., 2000, Nittono et al., 2001) and, irrespective of the sleep stage, in newborns (Cheour et al., 2000, Ceponiene et al., 2002; see also Friederici et al., 2002).

Consistent with this, the MMN is usually not affected by the predictability of deviant stimuli. As already reviewed, Scherg et al. (1989) found that the MMN amplitude was not affected by whether deviants (20%) occurred regularly (every fifth stimulus in a block) or randomly in the stimulus block (the SOA was 1 s) (see also Jankowiak and Berti, 2007). [However, as already mentioned, when Sussman et al. (1998b) shortened the SOA so that the deviant was perceptually grouped with the 4 preceding standards, then no MMN was elicited.] Moreover, the MMN amplitude was not affected by the presentation of a visual stimulus serving as a time signal for the subsequent occurrence of a deviant auditory stimulus (Ritter et al., 1999, Rinne et al., 2001).

6. Involuntary attention switch to auditory change (passive attention)

It is assumed that the activation of an auditory change-detection mechanism reflected by the MMN may also trigger the switching of attention to potentially important events in the unattended auditory environment (Näätänen et al., 1978, Näätänen and Michie, 1979, Giard et al., 1990). This suggestion is supported by numerous studies (Schröger, 1996a, Schröger, 1996b, Alho et al., 1997, Escera et al., 1998, Escera et al., 2000a, Escera et al., 2001, Escera et al., 2003, Schröger and Wolff, 1998a, Schröger and Wolff, 1998b, Wolff and Schröger, 2001a, Wolff and Schröger, 2001b, Schröger et al., 2000, Berti and Schröger, 2001; Roeber et al., 2003a, Roeber et al., 2003b, Berti et al., 2004, Shestakova et al., 2002a, Yago et al., 2001a, Yago et al., 2001b, Jankowiak and Berti, 2007, Rinne et al., 2006). These results indicated that MMN-eliciting sound changes in irrelevant auditory background stimulation distract task performance and, further, that they also elicit a subsequent P3a response thought to be associated with the actual orienting of attention to a deviance in an unattended sound sequence or in some irrelevant feature of attended sounds (see Squires et al., 1975, Ford et al., 1976, Escera et al., 1998, Escera et al., 2001, Shestakova et al., 2002a, Wang et al., 2005b; for reviews, see Friedman et al., 2001, Ranganath and Rainer, 2003). This is also supported by the fact that the MMN may also be followed by autonomic nervous system (ANS) responses associated with the involuntary orienting of attention such as heart-rate deceleration and the skin-conductance response (Lyytinen et al., 1992; see also Sokolov et al., 2002).

For example, Schröger (1994), using a selective dichotic-listening paradigm, found that the RT to an infrequent softer-intensity stimulus in the right ear increased and the hit rate attenuated when this target stimulus was preceded (with a 200-ms lead time) by a frequency deviant in the left ear. In addition, when the frequency deviation was 50 Hz (standard 700 z), the RT increased by 12 ms, and with a deviation of 200 Hz, the RT prolongation was 26 ms. Both frequency deviants elicited MMNs while the wider frequency deviant also elicited N2b-P3a waves. According to Schröger (1994), “this performance decrement was probably due to attentional capture to the to-be-ignord channel triggered by the deviants of this channel.” He further proposed that the data pattern obtained supports the hypothesis that the neural processes generating the MMN may be involved in a mechanism of passive attention switch (pp. 88–89).

Interestingly, the MMN elicited by occasional intensity increments is followed by a distinct P3a, whereas that elicited by equivalent intensity decrements usually is not (Rinne et al., 2006; see also Näätänen, 1992; Fig. 3). Intensity decrements tend to be much less attention-catching auditory events than are intensity increments. This might be due to the fact that intensity increments also activate fresh afferent neurons (resulting in N1-component enhancement), whereas intensity decrements (resulting in N1 attenuation; Näätänen et al., 1989a) appear to activate no, or only a small number of, such neurons.

In general, this attention-switching mechanism even responds to minor changes, with its sensitivity approaching that of the behavioural discrimination in attend conditions. For instance, Berti et al. (2004) found that subjects’ performance in a tone-duration discrimination task deteriorated in almost a half of the trials even by an occasional 1% frequency change in this tone. Importantly, in contrast to the MMN, the P3a and RON (the reorienting negativity; Schröger and Wolff, 1998b) were only elicited in trials with an RT prolongation, i.e., when an involuntary attention switch occurred.

The prefrontal cortex has an important role in controlling the direction of attention (Fuster, 1989, Stuss and Benson, 1986, Stuss and Knight, 2002; Näätänen, 1988, Näätänen, 1990, Näätänen, 1992). Therefore, it was proposed that the frontal MMN activity might signify the call for (cf. Öhman, 1979), or the initiation of, the involuntary orienting of attention to a change in the acoustic environment detected by the preattentive auditory-cortex MMN mechanism (Giard et al., 1990; Näätänen, 1979, Näätänen, 1985, Näätänen, 1988, Näätänen, 1990, Näätänen, 1992, Näätänen et al., 1978; Näätänen and Michie, 1979, Rinne et al., 2000). Consistent with this hypothesis that preconscious auditory-cortex change detection triggers frontal mechanisms of attention switch, the time course of this frontal activation to auditory change is slightly delayed relative to that of the auditory-cortex activation (Rinne et al., 2000, Kwon et al., 2002). Alternatively, it was suggested that, instead of attention switching, the prefrontal activation contributing to the MMN might be related to a contrast-enhancement mechanism which would be activated when the supratemporal system gets into difficulty in discriminating stimuli (Doeller et al., 2003, Opitz et al., 2002; see also the “sensitization negativity” of Näätänen et al., 1982; see also Alho et al., 1994a).

The role of the prefrontal MMN generator in initiating a switch of attention towards an auditory change was also supported by the effects of ethanol on the MMN and performance. Jääskeläinen et al. (1996c) found that the attenuation of the MMN amplitude even by small amounts of ethanol (0.55 g/kg) resulted from the attenuation of the prefrontal rather than that of the supratemporal MMN subcomponent. Moreover, it was observed that such a small amount of ethanol in fact decreased the distracting effect of changes in task-irrelevant auditory stimuli on performance accuracy in a visual forced-choice discrimination task, i.e., improved the performance under such conditions (Jääskeläinen et al., 1996a). Taken together, these findings suggest that even small amounts of ethanol attenuate the prefrontal process generating the frontal MMN subcomponent and initiating an involuntary switching of attention towards deviant sounds in an irrelevant auditory stream (for a review, see Jääskeläinen et al., 1996b). In addition, quite recently, Kähkönen et al. (2005) obtained MMN evidence for serotonic and Kähkönen et al. (2002) for dopaminergic modulation of this involuntary attention-switching response. [See also Simoens et al. (2007) who interpreted their duration-MMN data as suggesting that cortisol interferes with the preattentive mechanism of the MMN response.]

The functioning of this attention-switching mechanism was also studied in the presence of background noise.When presenting standards and deviants to the right ear, Levänen and Sams (1997) found that music presented to either ear abolished the MMNm to deviants whereas white noise did the same only when it was presented to the right ear. Furthermore, behavioural discrimination, studied in a separate condition, yielded an analogous pattern of data. The authors proposed that during music masking, standard stimuli are integrated with the constantly changing features of the masker into integrated events. Furthermore, similar integration, of course, affects, as already maintained by the authors, the neural representation of the deviants. Therefore, according to them, the representations of the standards and deviants become so similar that they do not allow change detection, resulting in the absent or strongly reduced MMNm and in the impaired discrimination performance. (See also Kozou et al., 2005, Mäntysalo and Salmi, 1989.)

The role of the prefrontal MMN generator in a switch of attention to auditory change was also supported by Restuccia et al. (2005) who found that the attentional load of the primary task affected the frontal but not the temporal MMN generators.

An auditory change does not cause attention switch each time the change occurs (e.g., Berti et al., 2004), however; it may remain consciously unperceived. This may be due to the fact that the deviant sound does not differ enough from the stimulus represented by the sensory-memory trace, either because the difference is very small or the neural trace is informationally too diffuse. That is, the representational width of the neural trace for the auditory attribute involved is too large in relation to the magnitude of change (Näätänen and Alho, 1997). Second, the memory trace underlying the previous sounds may already have decayed, so that no automatic comparison process can take place. Third, when attention at the moment of change is intensively focused elsewhere, then the threshold for attention switch may be elevated (Lyytinen et al., 1992, Näätänen, 1992, Harmony et al., 2000, Restuccia et al., 2005). Fourth, the excitability of the frontal MMN generator may be temporally decreased, for example, by alcohol (Jääskeläinen et al., 1996c). Fifth, when the subject knows in advance when a deviant stimulus in a block will occur, then no attention switch (as reflected by the absence of behavioural effects and the P3a) takes place even though the MMN is normally elicited (Sussman et al., 2003b).

For a model of attention and automaticity in the auditory modality, see Näätänen, 1990, Näätänen, 1992. This model (Fig. 4) postulates a fully automatic stimulus analysis (by the Permanent Feature-Detector and Transient-Detector systems) and a brief storing of this information into a sensory-memory trace (in the neural subtrate of Sensory Memory). The phase of the formation of this trace on the basis of the input from different feature detectors underlies the emergence of sound percept. Furthermore, on this model, there are two parallel routes to attention switch: (1) N1 neurons triggered by sound onset (causing attention switch to the occurrence of the sound); (2) MMN neurons triggered by auditory change, i.e., by a new input to the neural substrate of sensory memory, not corresponding to the stimulation already represented by this memory system. Further, this memory-trace formation initiated by the new stimulus in the auditory cortex triggers the frontal-cortex mechanisms of attention switch to stimulus change (the auditory- and frontal-cortex MMN subcomponents, respectively). In addition, the precise sensory information maintained in the neural mechanism of sensory memory is used to develop the attentional trace when selective attention is to be directed to the corresponding stimulus stream, a temporary selection mechanism for the to-be-attended input depending on continuous maintenance (Näätänen, 1982, Näätänen, 1990). Hence, on this model, the sensory-memory information involved in MMN elicitation is used in tuning the selective-attention mechanism to select just a designated kind of stimulus among the concurrent ones, e.g., a tone of 1000 Hz among tones of 1050 Hz. Consequently, to set up and maintain this highly precise voluntary representation, the executive mechanism must use the corresponding stimulus representation of sensory memory. The assumed dependence of selective attention on fresh sensory-memory information is shown, for example, by the fact that one needs to receive a few exemplars of the to-be-attended stimulus before stimulus selection can occur, i.e., the selective sensory state (Näätänen, 1975, Näätänen, 1982, Näätänen, 1990) can be developed (and the PN be generated) (see Hansen and Hillyard, 1983, Hansen and Hillyard, 1988, Donald and Young, 1982).

6.1. MMN to different types of auditory change

The MMN (MMNm) is elicited by a discriminable change in any repetitive aspect of auditory stimulation. Hence, it is elicited by infrequent changes in the frequency (Hari et al., 1984, Näätänen et al., 1978, Sams et al., 1985a, Alho et al., 1993b, Jacobsen and Schröger, 2001, Berti et al., 2004, Lang et al., 1990, Takegata and Morotomi, 1999; Tervaniemi et al., 2000b, Yago et al., 2001a, Yago et al., 2001b, Sambeth et al., 2006) (Fig. 1), intensity (Lounasmaa et al., 1989, Näätänen et al., 1978, Näätänen et al., 1987b, Näätänen et al., 1987a, Näätänen et al., 1988, Kisley et al., 2004, Novitski et al., 2007) (Fig. 3), the direction of a frequency glide (Sams and Näätänen, 1991, Pardo and Sams, 1993), timbre (Tervaniemi et al., 1997a, Tervaniemi et al., 1997b, Toiviainen et al., 1998), and spatial location (Paavilainen et al., 1989, Picton et al., 2000, Schröger, 1995, Winkler et al., 1998, Kaiser et al., 2000a, Kaiser et al., 2000b, Kaiser and Lutzenberger, 2001, Kujala et al., 1992, Al’tman et al., 2004, Ruusuvirta, 1999, Tata and Ward, 2005, Deouell et al., 2003, Deouell et al., 2006, Sonnadara et al., 2006a, Sonnadara et al., 2006b) of a repetitive sound. Furthermore, changes in complex sounds such as chords also elicit an MMN (Alho et al., 1996, Alho and Sinervo, 1997, Näätänen et al., 1993b, Nordby et al., 1994, Sams and Näätänen, 1991; Schröger et al., 1992, Schröger et al., 1994, Schröger et al., 1995, Schröger et al., 1996, Tervaniemi et al., 1999, Tervaniemi et al., 2000b, Winkler et al., 1998, Winkler and Näätänen, 1993). Also, a frequency change in a continuous tone elicits the MMN (together with the N1) (Lavikainen et al., 1995).

In addition, the MMN is elicited by changes in the temporal aspects of auditory stimulation such as sound duration (Kaukoranta et al., 1989; Jaramillo et al., 1999, Jaramillo et al., 2000, Ponton et al., 1997, Grimm et al., 2004, Deouell et al., 2003, Joutsiniemi et al., 1998, Näätänen et al., 1989b, Näätänen et al., 2004b, Michie et al., 2000, Todd and Michie, 2000, Roeber et al., 2003a, Roeber et al., 2003b, Ylinen et al., 2006), rise time (Lyytinen et al., 1992), ISI (Ford and Hillyard, 1981, Hari et al., 1989a, Kujala et al., 2001a, Näätänen et al., 1987a, Näätänen et al., 1993c, Nordby et al., 1988a, Sable et al., 2003), stimulus order (Nordby et al., 1988b, Kujala et al., 2001b, Schröger et al., 1995, Schröger et al., 1996; Tervaniemi et al., 1997a), a gap of a few milliseconds in the middle of a brief stimulus (Desjardins et al., 1999, Bertoli et al., 2001, Bertoli et al., 2002, Uther et al., 2003), and, as already reviewed, stimulus omission in blocks with a short constant SOA (Raij et al., 1997, Yabe et al., 1997, Yabe et al., 1998, Oceák et al., 2006) or the omission of the second tone of two paired tones with a short constant SOA (Tervaniemi et al., 1994b). Moreover, Schröger, 1994, Schröger et al., 1994, Winkler and Schröger, 1995, and Alho et al., 1996, Alho et al., 1993a found an MMN in response to deviance in the temporal structure of a complex spectrotemporal sound pattern consisting of simple tonal segments (Spiegel and Watson, 1981, Port, 1991). This MMN was obtained by infrequently exchanging the positions of two segments of this complex tonal pattern. In addition, Imada et al., 1993a, Kujala et al., 2000 found an MMN by infrequently altering the timing of one part of a repetitive rhythmic pattern. See also Jones and Perez (2002), Vaz Pato and Jones (1999), and Jones et al. (2000)! Consequently, these studies showed that the temporal structure of the sound patterns is encoded in detail by the auditory traces involved in the change-detection process.

Hence, it is clear that what is actually stored by the traces underlying the MMN process is sensory information about stimuli (or streams; Bregman, 1990) as events in time rather than as mere static stimulus properties (and, as already reviewed, these traces represent feature-integrated sensory memory). This form of sensory information corresponds to auditory perception (its subjective contents as a dynamic event) and serves the recognition of verbal, melodic, and other crucially time-based information (Näätänen, 1992; Näätänen and Winkler, 1999, Winkler et al., 1997; see also Cowan, 1984).

Furthermore, as already mentioned, Winkler et al. (2003b) demonstrated that the MMN is not just a laboratory phenomenon but is, in fact, even elicited by changes in auditory stimulus streams in the middle of every-day sound environment with multiple concurrently active sound sources. The authors presented traffic noise and movie sounds in parallel as background stimulation, overlapped by a sequence of foot-step sounds terminating with a deviant step sound, one resulting from stepping on a piece of glass. Even though all these foot-step sounds were totally inseparable in the acoustic description of the overlapping acoustic events, yet a distinct MMN was elicited by the deviant step sound. Moreover, the MMN elicitation tolerates some range of standard-stimulus variation, both in the feature distinguishing deviants from standards (Winkler et al., 1990) as well as in other features (Gomes et al., 1995, Huotilainen et al., 1993, Winkler et al., 1990).

The MMN is also elicited by deviant speech stimuli and by violations of abstract rules followed by the sequence of standard stimuli. These results will be separately reviewed later in the present article.

7. MMN as a function of the magnitude of stimulus change

In general, the MMN amplitude gets larger and peak latency shorter with the increasing magnitude of stimulus deviation. For the frequency change of a sinusoidal tone, this was shown by several studies [e.g., Sams et al., 1985a, Lang et al., 1990, Tiitinen et al., 1994, Berti et al., 2004, Näätänen et al., 1997, Yago et al., 2001a, Yago et al., 2001b, Novitski et al., 2004, Novitski et al., 2007; for an MMN-amplitude increase with increased spectral complexity of stimuli, see Tervaniemi et al., 2000b], for intensity decrement by Näätänen et al., 1989a, Rinne et al., 2006, for intensity increment by Rinne et al. (2006), for duration change by Amenedo and Escera, 2000, Näätänen et al., 1989b, Näätänen et al., 2004b, Jaramillo et al., 2000, for change in the spatial locus of sound origin by Paavilainen et al., 1989, Nager et al., 2003a, for change in the ISI (offset-to-onset) by Näätänen et al., 1993c, Kujala et al., 2001a, and for change in the width of a gap in the middle of a tone stimulus by Desjardins et al. (1999), Bertoli et al., 2001, Bertoli et al., 2002, and Uther et al. (2003). Furthermore, when a deviant stimulus deviates from the standard in two or several attributes, then the MMN amplitude shows additivity (Takegata et al., 1999, Takegata et al., 2001a, Takegata et al., 2001b, Takegata et al., 2001c; Schröger, 1995, Wolff and Schröger, 2001a, Wolff and Schröger, 2001b). This additivity, however, involves the supratemporal but not the frontal component of the MMN (Paavilainen et al., 2003b; see also Wolff and Schröger, 2001b).

8. MMN as an index of discrimination accuracy

In the foregoing, it was suggested that the MMN represents a pre-attentive feature-specific code of stimulus change and, further, that it might provide an objective index of the discrimination accuracy for the different acoustic feature dimensions. This is supported by the fact that, in general, the MMN sensitivity to small stimulus changes seems quite well to correspond to the behavioural discrimination thresholds, which holds both with normal subjects and clinical populations. Some of these studies will be reviewed in this section.

In normal subjects, Lang et al. (1990) found that the accuracy of the behavioural discrimination of a frequency difference between two successively presented tone stimuli strongly correlated with the MMN amplitude (recorded in a separate, passive session). Their subjects were high-school pupils of 17 years in age. Three groups (“good”, “moderate”, and “poor”) were formed on the basis of the accuracy of behavioural pitch discrimination in a same-different task with paired stimuli. In the subsequent MMN recordings (subjects were reading), the standard-stimulus frequency was fixed at 698 Hz, whereas that of the deviant stimulus was different in different blocks. In the good-behavioural-discrimination group, the MMN was elicited with a frequency deviation of 19 Hz; in some of these subjects, 12 Hz was enough. (The ISI used was far too long for the best discrimination sensitivity.) In contrast, with the poor performers, the average deviation had to be increased to 50–100 Hz until an MMN was elicited. The moderate performers took an intermediate position between the two extreme groups (see Fig. 5).

  1. Download : Download full-size image

Fig. 5. MMN as a function of behavioural pitch-discrimination accuracy. The MMN (recorded in a separate reading condition) was larger in amplitude in school children classified as “good” in a behavioural pitch-discrimination task (Seashore’s test of musicality) than those who were “mediocre” or “weak” in this task. Adapted, with permission, from Lang et al. (1990).

Moreover, employing rhythmic stimulus patterns with an occasional order reversal of two tones belonging to such a pattern, Tervaniemi et al. (1997a) found that subjects detecting these reversals in a discrimination task (a test of musical abilities) with a high accuracy had a considerably larger-amplitude MMN (recorded in a passive condition) than those who showed poor discrimination. For corroborating results with complex stimulus patterns, see Näätänen et al., 1993b, Tervaniemi et al., 2001, van Zuijen et al., 2004, van Zuijen et al., 2005, and Brattico et al. (2002a). In addition, Aaltonen et al. (1994), using phoneme stimuli, found a correlation between the MMN amplitude and the discrimination of minor (within-category) changes in the Finnish vowel /y/. See also Martin et al. (1997); see, however, Sharma et al. (1993)!

Bazana and Stelmack (2002), however, found no correlation between performance and the MMN amplitude, but reported shorter MMN peak latencies in subjects with higher than lower “mental ability” (determined on the basis of a battery of mental-ability tests and academic performance).

With clinical populations, a close relationship between speech perception and the MMN amplitude was found in cochlear-implant patients by Kraus et al., 1995, Groenen et al., 1996, Kelly et al., 2005, Roman et al., 2005. In addition, Aaltonen et al.’s (1993) results with aphasic patients suggest that the MMN, or its absence, could provide specific information with regard to the perceptual deterioration caused by a brain lesion. Two of their patients, those with a posterior left-hemispheric lesion, had a normal MMN to the frequency change of a simple tone, whereas a vowel change (from Finnish /y/ to /i/) elicited no MMN. Furthermore, Kraus et al. (1996), studying school children who were good and those who were bad in discriminating the /ba/ and /da/ syllables, found a distinct MMN for these syllables in the children with good behavioural discrimination only. (For the easier contrast /ba/-/wa/ used, there was no MMN or behavioural difference between the two groups.) Importantly, children with speech-discrimination difficulties were the ones who also had learning problems, suggesting a role of these discrimination difficulties in the emergence of the learning or other problems at school.

The MMN can also reflect improvement in discrimination performance as a result of training. These studies will be reviewed in Section 8.1.

Some studies (e.g., Alho and Sinervo, 1997, Dalebout and Stack, 1999, Kraus et al., 1999, Allen et al., 2000, Paavilainen et al., 2007, Tremblay et al., 1998) even obtained MMNs to changes that were not behaviourally discriminated. In the course of discrimination training, an MMN can appear even before behavioural discrimination ability (Tremblay et al., 1998).

In all these studies, relatively short ISIs were used in order to prevent memory decay from affecting the results. If the ISI is long, then no MMN may be elicited, as already reviewed, probably because the memory trace of the standard stimulus did not last until the moment of the delivery of the deviant stimulus. Using this logic, Pekkonen et al. (1994) were able to separately determine discrimination accuracy and sensory-memory duration for tonal frequency in their patients with Alzheimer’s disease. With a short ISI of 1 s, the authors found no MMN difference between the patients and the controls, concluding that the patients’ discrimination accuracy was not affected. In contrast, with a 3-sec ISI, patients disclosed no MMN, whereas controls still had a distinct MMN. This data pattern permitted the authors to conclude that the sensory-memory decay was expedited in patients with Alzheimer’s disease, whereas their auditory discrimination was unaffected. In the same vein, with a short ISI, there was no MMN-amplitude difference between the young (mean 22 yr) and elderly (mean 59 yr) males, whereas when the ISI was prolonged, then the MMN amplitude was much more attenuated in the elderly than in the young (Pekkonen et al., 1996; see, however, Czigler et al., 1992).

For reviews on the MMN as an index of discrimination accuracy, see Näätänen and Alho (1995, 1997). Näätänen and Alho (1995) proposed that the MMN provides the best available neurophysiological measure of automatic central processing in audition.

In order for the MMN to become a useful tool for every-day clinical practise, one should be able to measure it reliably in single subjects and patients, however (see Lang et al., 1995, Kraus et al., 1999). This would necessitate, in addition to improved paradigms (see Näätänen et al., 2004a, Pakarinen et al., 2007), improved signal-analyzing methods (Picton et al., 2000). For some progress along these lines, see Ponton et al., 1997, Ponton et al., 2000a, Ponton et al., 2000b, Ponton et al., 2002, McGee et al., 1997, Ha et al., 2003, and Marco-Pallares et al. (2005).

For the MMN replicability, see Escera et al., 1999, Escera and Grau, 1996, Frodl-Bauch et al., 1997, Joutsiniemi et al., 1998, Kathmann et al., 1999, Kujala et al., 2001a, Kujala et al., 2001b, Pekkonen et al., 1995; see however, Cacace and McFarland, 2003, Uwer and Von Suchodoletz, 2000. For MMNm replicability, see Tervaniemi et al., 1999, Tervaniemi et al., 2005a. In general, the MMN replicability is quite good at the group level but at the individual level, there still is ample space for further improvement before the MMN provides a reliable tool for clinics at the level of individual patients.

8.1. Learning/training effects on the MMN

A large number of studies showed that with the MMN, one can monitor the progress in sound discrimination. In an early study already referred to, Näätänen et al. (1993b) used a complex spectrotemporal stimulus pattern as the standard stimulus. They found that subjects who were able to detect a slightly deviant pattern in a behavioural discrimination task showed an MMN to this deviant stimulus in a subsequent passive condition (Fig. 6). In contrast, no MMN was elicited in those subjects who were not able to behaviourally discriminate the stimuli in the preceding discrimination condition. However, after they learned to discriminate them during the course of the session, then the MMN was elicited by the deviant patterns in the subsequent passive conditions. Importantly, in another subject group, even a long-duration passive exposure to these complex stimuli resulted in no MMN emergence. For further early MMN studies with these kinds of stimuli, see Schröger et al., 1992, Schröger, 1994.

  1. Download : Download high-res image (291KB)
  2. Download : Download full-size image

Fig. 6. (Left) Grand-average vertex (Cz) event-related potentials (ERPs) of 7 subjects (reading a book) to standard (dashed lines) and deviant (solid lines) stimulus patterns during the early, middle, and late MMN-recording phases of the session which were each preceded by a behavioural discrimination task. The performance of the subjects in the discrimination test at the early phase was weak but was considerably improved after the second and, in particular, the third phase. This improvement was accompanied by MMN (shadowed area) emergence. The eight-segment stimulus pattern is schematically illustrated at the bottom of the figure. The only difference between the standard (the 6th segment 565 Hz) and deviant (650 Hz) patterns is indicated by the arrow. (Right) Corresponding data for those 5 subjects who were good in discriminating deviants among standards even at the early phase. They had an MMN even in this early phase of the session. Adapted, with permission, from Näätänen et al. (1993b).

Subsequently, Atienza et al. (2001), using identical stimuli, corroborated this MMN training effect and, very importantly, showed that it could even be recorded in REM sleep (3 days after the training). Furthermore, in their subsequent study using the same stimuli, Atienza et al. (2004) found evidence suggesting that the MMN might also reflect memory consolidation after training: the MMN recorded after 48 h from the end of the training was considerably larger in amplitude than that recorded after 24 h after the training.

Recently, Gottselig et al. (2004) added to this paradigm a second, more difficult, deviant (each deviant being presented at p = .075 in the same stimulus block). The discrimination of one of the two deviants was practiced (6 min.) by one subject group, while a second group practiced the discrimination of the other deviant. The practice-related MMN enhancement was again found but only for the easier deviant, that used by Näätänen et al. (1993b), and this enhancement occurred regardless of which deviant was practiced. Furthermore, the magnitude of this enhancement was highly correlated with the improvement in discrimination. In addition, there was a bilateral superior-temporal MMN source with a left-hemispheric predominance for the easier deviant before and after training, whereas only the right-hemispheric source was strengthened by training.

An equivalent training effect, but one with short (8-tone) melodies randomly occurring at 12 different frequency levels, with the deviants having a slightly different contour than that of the standards, was found by Tervaniemi et al. (2001), however in a group of “musical” subjects only. In “nonmusical” subjects, no MMN emerged at any phase of the experiment, and they never learned to discriminate deviants from standards.

A corresponding discrimination-training effect but one with phonetic stimuli was found by Kraus et al. (1995) who used different (within-category) variants of the /da/ syllable initially impossible for subjects to discriminate from one another. Importantly, the training effect, i.e., the improved discrimination performance along with the enhanced MMN amplitude, was present even at one month from the end of the training.

In addition, Kujala et al. (2003) studied the MMNm to Morse-coded syllables before and after an intensive Morse-code training period of 3 months. They found that, initially, the MMNm to changes in the Morse-coded syllables was, on average, larger in the hemisphere opposite to the one dominant for the MMNm to changes in native-language speech sounds whereas after the training period of 3 months, the pattern was reversed: the mean Morse-code MMNm became lateralized to the hemisphere that was dominant for the speech-sound MMN. This suggests that memory traces for the Morse-coded acoustic language units develop within the hemisphere that already accommodates the permanent traces for natural speech sounds. The authors suggested that these plastic changes manifest the close associations formed between the neural representations of the tone patterns and phonemes. Further, Menning et al. (2002) trained German subjects to discriminate the different mora durations of Japanese from each other. In Japanese, the “mora” is a temporal unit that divides words into almost isochronous segments (e.g., na-ka-mu-ra). So, in essence, what was practiced was duration discrimination (10 sessions of 1.5 h per day), which resulted in improved discrimination, as reflected by behavioural measures as well as by the MMNm-amplitude increase and peak-latency decrease. In addition, the Japanese subjects showed a more sensitive MMNm than did the German subjects for smaller differences. Furthermore, there were significant differences for the pre- versus post-training MMNm generator loci between the Japanese and German subjects.

It was concluded that even in adults, the perceptual learning of non-native mora-timing occurs rapidly and deeply and, further, that the enhanced behavioural and neurophysiological sensitivity caused by training indicates a strong relationship between learning and plastic changes, as reflected by the MMNm, in the cortical substrate.

Previously, Menning et al. (2000) succeeded in demonstrating a training effect in the discrimination of frequency differences in a simple sinusoidal tone. Their standard stimulus was 1000 Hz, whereas the deviants were of 1005, 1010, and 1050 Hz. Frequency discrimination rapidly improved during the first training week and thereafter showed small but constant improvements. Furthermore, the N1m to standard stimuli and the MMNm to deviant stimuli increased in amplitude during the training. This enhancement persisted until training was finished but was somewhat decreased at 3 weeks later.

9. The MMN to speech

9.1. The MMN as an index of memory traces for phonemes

As already mentioned, the MMN (MMNm) is also elicited when speech sounds are presented in a passive oddball paradigm (Aaltonen et al., 1987, Aaltonen et al., 1993, Aaltonen et al., 1994, Alho et al., 1998a, Bradlow et al., 1999, Cheour et al., 1998, Dehaene-Lambertz and Baillet, 1998, Diesch and Luce, 1997, Dehaene-Lambertz, 2000, Friederici et al., 2002, Ikeda et al., 2002, Jacobsen et al., 2004a, Jacobsen et al., 2004b, Honbolygó et al., 2004, Kayser et al., 1998, Kushnerenko et al., 2001, Leppänen et al., 2002, Pihko et al., 1999, Pihko et al., 2005, Csépe, 1995, Aulanko et al., 1993, Kraus et al., 1992, Kraus et al., 1993a, Kraus et al., 1993b, Kraus et al., 1995, Kraus et al., 1996, Kraus et al., 1999, Maiste et al., 1995, Martin and Boothroyd, 1999, Martin et al., 1999, Mathiak et al., 1999, McGee et al., 1996, Phillips et al., 2000, Rinne et al., 1999a, Rivera-Gaxiola et al., 2000; Szymanski et al., 2000; Sams et al., 1990, Savela et al., 2003, Sharma et al., 1993, Sharma and Dorman, 1999, Sharma and Dorman, 2000, Sandridge and Boothroyd, 1996, Sussman et al., 2004, Vihla and Eulitz, 2003 see also Dehaene-Lambertz and Dehaene, 1994). [Some studies (Wunderlich and Cone-Wesson, 2001, Pettigrew et al., 2004a, Pettigrew et al., 2004b) were not able to obtain reliable MMNs with minor speech-sound contrasts, however.] Furthermore, studies using speech sounds also showed that with the MMN, one can probe the permanent language-specific speech-sound memory traces. Näätänen et al., 1997, Dehaene-Lambertz, 1997, Sharma and Dorman, 1999; Szymanski et al., 1999) found that an infrequent vowel deviant presented in a sequence of native-language vowel standards elicited a larger-amplitude MMN when it was a typical exemplar of a vowel category in the subject’s native language (Finnish) than when it was no typical vowel in this language (/õ/ of the Estonian language not existing in Finnish). The vowel-related MMN enhancement originated, according to the subsequent MEG recordings, from the left posterior auditory cortex, suggesting this cortex as the locus of the language-specific vowel traces. In contrast, the concomitant acoustic change necessarily accompanying the vowel change elicited a bilateral auditory-cortex MMN subcomponent. The authors proposed that these long-term, or permanent, traces serve as recognition patterns that are activated by the corresponding speech sounds, enabling one to correctly perceive them, and, further, that these traces provide reference information for pronunciation.

Furthermore, using a similar Finnish-Estonian cross-linguistic design, Cheour et al. (1998) obtained evidence suggesting that the language-specific speech-sound memory traces develop between 6 and 12 months of age (see also Rivera-Gaxiola et al., 2005).

In addition, in French infants (mean age 3.7 mo), Dehaene-Lambertz and Baillet (1998) found that the mismatch response (with positive polarity) for an across-category change of a syllable was larger in amplitude than that for an acoustically equivalent within-category change and differed from it in scalp distribution. Also, as in adults (e.g., Näätänen et al., 1997), the mismatch responses in infants to acoustic and phoneme-category changes occurred in parallel. The dipole calculated for this response for the across-category change was posterior and dorsal to that for the within-category change (see also Dehaene-Lambertz et al., 2000).

The existence of language-specific memory traces for phonemes was also reflected in Finnish–Hungarian (Winkler et al., 1999b), English–Japanese (Phillips et al., 1995), French–Japanese (Dehaene-Lambertz et al., 2000), English–Hindi (Shafer et al., 2004; see also Rivera-Gaxiola et al., 2000), and French–Hindi (Dehaene-Lambertz, 1997) cross-language studies. Furthermore, Aaltonen et al. (1997) found that the MMN amplitude paralleled the perceptual magnet effect (Kuhl, 1991; see also Aulanko et al., 1993). In subjects efficiently categorizing synthesized /i/ and /y/ vowel sounds, the discrimination performance was poorer and the MMN amplitude lower for synthesized vowel pairs in the vicinity of the prototypical /i/ than for the same magnitude of physical change between the two synthesized vowels that substantially differed from the prototype but were still categorized as /i/ (Aaltonen et al., 1997; see also Vihla et al., 2000).

An MMN category effect in the voice-onset time (VOT) continuum for the /da/–/ta/ distinction was found by Sharma and Dorman (1999). Consistent with this, subjects’ behavioural discrimination of a VOT change of an equal magnitude was more accurate across the categories than within the /ta/ category. An analogous result was not obtained in the similar studies of Maiste et al. (1995), Sams et al. (1990), and Sharma et al. (1993), however.

It was also found that French vowel contrasts which initially elicited no MMN in Finnish children of 4–6 years began to elicit it soon after entry to a French Kindergarten (Cheour et al., 2002d; see also Dehaene-Lambertz and Baillet, 1998, Rivera-Gaxiola et al., 2005). Furthermore, learning a second language in adulthood (Winkler et al., 1999b; see also Winkler et al., 1999a; see also Tremblay et al., 1997, Tremblay et al., 1998) also caused an MMN-amplitude enhancement (for reviews, see Kraus and Cheour, 2000, Näätänen, 2001; see also Kuhl, 2004).

Winkler et al., 1999a, Winkler et al., 1999b standard stimulus was a vowel that is perceived as /e/ both by Finns and Hungarians, whereas the deviant stimulus was the Finnish /ä/, which in the Hungarian language (spoken in the Budapest region) belongs to the same phoneme category as the /e/ used as the standard. It was found that Hungarians (of the Budapest region) who knew no Finnish had no MMN to the deviant /ä/ in the ignore condition and were very poor in discriminating it behaviourally from /e/ in the discrimination condition (Fig. 7). In contrast, Hungarians who had lived in Finland for years and learned to speak fluent Finnish had a distinct MMN to /ä/, quite similar to that of native Finns, and were also able behaviourally to discriminate it from /e/. These results thus showed the formation of new phoneme categories as a result of foreign-language exposure and use, with the original wide /e/ category (also incorporating /ä/) being divided into separate /e/ and /ä/ categories, and, further, that with the MMN, one can monitor how adults learn to perceive and discriminate correctly foreign-language phonemes.

  1. Download : Download high-res image (317KB)
  2. Download : Download full-size image

Fig. 7. MMN to vowel contrasts (top) and performance in a vowel identification task (bottom). Top: grand-mean MMN responses at Fz in Finns (dashed line), in Hungarians fluent in Finnish (solid yellow), and in “naive” Hungarians (not speaking Finnish; (solid green)) to deviants /ae/ (left) and /y/ (right) presented in sequences of a repetitive vowel /e/. No MMN was elicited in “naive” Hungarians by vowel /ä/(/ae/), which is relevant for Finns but not for Hungarians. However, very similar MMNs were elicited by this vowel both in Finns and Hungarians fluent in Finnish. In contrast, vowel /y/, present in both languages, elicited similar MMNs in all groups. Bottom: Group averages of correct identification rates (left) of vowels ä and /e/, and reaction times (right) (standard errors of the mean in brackets). Whereas Finns and Hungrians fluent in Finnish reliably identified these vowels, the performance of “naive” Hungarians was at chance level. The Finns and fluent Hungarians were also faster than “naive” Hungarians in the identification task. From Winkler et al., 1999b.

Importantly, Tremblay et al.’s (1998) MMN evidence for the development of the discrimination ability in the course of training preceded in time the behavioural discrimination of the phoneme contrast trained. Furthermore, Tremblay et al.’s (1997) results showed transfer of a discrimination-training effect to an untrained discrimination. Normal-hearing English speaking adults were trained to discriminate and identify a voicing contrast that does not occur in English but is phonetically salient in Hindi and Eastern Armenian. These subjects were trained to hear that distinction of voice-onset time in a bilabial context, but were evaluated before and after training on their ability to discriminate and identify the voicing contrast both in the bilabial context (training condition) and in an alveolar context (transfer condition). After training, subjects could identify and discriminate both training and transfer contrasts behaviourally. These training and generalization effects were manifested by increases in the MMN area and duration, which were more pronounced over the left than right frontal cortex. There are, however, limitations of foreign-vowel discrimination learning as suggested by MMN data (Peltola et al., 2003).

In addition, cross-linguistic MMN differences in the processing of speech-sound duration were found by Nenonen et al., 2003, Nenonen et al., 2005, Ylinen et al., 2005, Ylinen et al. (2006), and Minagawa-Kawai et al. (2004). Furthermore, Tervaniemi et al. (2006) extended these results to involve even non-speech sounds. They found that the MMN to the duration decrement of a simple sinusoidal tone was much larger in amplitude in Finns than Germans. (In German, duration is not linguistically equally relevant as it is in Finnish.) Behavioural duration discrimination, too, was more accurate in the Finnish than German subjects. In contrast, in frequency discrimination, there was neither MMN nor behavioural difference between the two subject groups. Previously, Jaramillo et al. (1999) found that Finnish subjects’ MMN for duration changes was larger in amplitude when these duration changes occurred in speech sounds than when they occurred in acoustically equivalent complex tones.

Moreover, as already mentioned, Menning et al.’s (2002) German subjects were inferior to Japanese ones in behavioural duration discrimination (with Japanese speech sounds) and also had a less sensitive MMNm than that of the Japanese subjects for these changes, apparently because duration discrimination is much less important for understanding German than Japanese.

Further, Weber et al. (2004) studied German infants’ perception of the trochaic stress pattern (stress on the first syllable of the word), the most common stress pattern in German, helping one to segregate consecutive words from one another. It was found that an MMN was elicited by trochaic stress-pattern stimuli among iambic standards (stress on the second syllable) in 5 months old but was not elicited in 4 months old German infants. This result demonstrates a clear development between 4 and 5 months in the processing of the different stress patterns relevant for word recognition. (For P3a data suggesting that changes in prosody have a powerful attention-catching quality, see Wang et al., 2005b).

Finally, new perspectives to the very early speech-sound learning were opened by Cheour et al. (2002a) who found that sleeping newborns can learn to discriminate vowels. Before the sleep, they had no MMN to the Finnish /y/–/i/ contrast but had one after they were exposed to this stimulation in sleep (see Fig. 8).

  1. Download : Download full-size image

Fig. 8. Left: There was no MMN for the /y/i/–/y/ contrast in sleeping newborns in the evening (dashed line) but a large MMN was elicited in these infants (still sleeping) in the morning (solid line) after noctural training using these stimuli. Middle: In Control Group I, receiving no training, and right: In Control Group II, trained with the /a/–/e/ contrast, there was an MMN for the /y/i/–/y/ contrast neither in the evening nor in the morning (from Cheour et al., Nature 2002).

9.2. Normalization of speech sound perception as reflected by the MMN

Most of the MMN studies with speech sounds used only one (acoustically constant) exemplar of each phonetic category included, however; therefore, the MMN obtained could, at least in most studies, be due either to phonetic-category or mere acoustic change. In fact, as already mentioned, both types of changes contribute to the MMN obtained, with the acoustical MMN being bilaterally generated and the purely phonetic MMN in the left hemisphere only (in most subjects) (Näätänen et al., 1997).

In order to eliminate the acoustic component, Dehaene-Lambertz and Pena (2001) used, in one of their conditions, 3 different female speakers for the 3 exemplars of a standard (/ta/) presented before a deviant (/pa/, or vice versa) always uttered by the same, fourth, female. Nevertheless, a mismatch response in sleeping newborns was obtained. Furthermore, this response was very similar to that elicited in the same-speaker condition of the study. These remarkable results showed that even a sleeping infant’s brain continuously performs speech-sound normalization, a necessary prerequisite for correct speech perception. See also Phillips, 2001, Rivera-Gaxiola et al., 2000, Sharma and Dorman, 2000.

Moreover, Shestakova et al. (2002b) used 150 different male-voice exemplars of each of the three vowel (a, u, i) categories involved. These vowels were presented in short sequences of each category with continuously varying exemplars (standards), there being no break before the onset of the sequence of another vowel (e.g., 10 exemplars of /a/, followed by 10 examplars of /u/, and these followed by 10 exemplars of /i/). Hence, the first stimulus of each sequence served as a deviant stimulus. An MMNm response was obtained above the left hemisphere only; hence, the acoustic MMN component was abolished (also, see Jacobsen et al., 2004b, for abstraction from F0 and intensity variation in the categorization of Klatt-synthesized phonemes).

Obviously, this left-hemispheric phonetic MMNm component depended on the presence of the long-term memory traces for the mother-tongue vowels, which are able to identify the invariant vowel-identity code amongst wide acoustic variation. This code must be identical, or almost identical, in any sound perceived as, for instance, /e/ irrespective of whether it is uttered by a male, female, or child voice. The core property of this phoneme-identity code hence must be its invariance in the midst of very wide acoustical variation, suggesting that there must exist neuronal populations which detect such an invariance. As this invariance can involve no level of any acoustical feature per se, then it probably is of relational nature, i.e., the phoneme identity might be based on fixed relations or ratios between the different levels of certain acoustic features (Näätänen, 2001).

Importantly, the existence of such types of neuronal populations was demonstrated by Paavilainen et al. (1999). In one of their conditions, the within-pair frequency change was ascending both for the standard and deviant pairs but the standard pairs had a constant within-pair frequency ratio between the two tones (5 musical steps), whereas the deviant pairs either had a larger (7 or 8 steps) or smaller (2 or 3 steps) within-pair frequency ratio than that of the standards. Further, the stimulus pairs randomly varied over a wide frequency range. Nevertheless, an MMN was elicited by deviant pairs. Moreover, an analogous result was obtained even when the two tones were simultaneously presented as a complex tone consisting of two frequency components (with the frequency level of the stimuli again varying across an extensive frequency range).

For an ingenious way to test the FUL (featurally underspecified lexicon) model of speech-sound perception in the presence of acoustic variation by the MMN, see Eulitz and Lahiri (2004).

For recent reviews of MMN studies of phoneme perception, see Dehaene-Lambertz and Gliga (2004), Näätänen (2001), and Kraus and Cheour (2000).

9.3. An electrophysiological index of match with a speech-sound trace

One might wonder whether there is any electrophysiological sign of the standard stimulus matching the postulated speech-sound trace. The first electric sign of the speech-sound specific recognition activation was reported by Hoppe et al. (1996) who found a negativity peaking at about 170 ms (“N170”) from phoneme onset, which was not elicited by the noise analog of the stimulus. This bilateral (Rosanovski et al., 1999) “N170” is, presumably, generated when the sound-elicited process in the central auditory system encounters the corresponding recognition model or trace in the sensory-memory system and might hence manifest a read-out of the corresponding phonetic code to perception.

In addition, using sinusoidal tones, Näätänen and Rinne (2002) described a “repetition negativity” which was elicited by a few consecutive repetitions of a tone in a randomized sequence of simple tones with different frequencies. This negativity was also found when complex tone stimuli were used. (See also Kaernbach et al., 1998, Berti et al., 2000, Wolff and Schröger, 2001a.) Moreover, Haenschel et al. (2005) described a slow positivity to repetitions (“repetition positivity”) of a non-speech standard stimulus, suggesting that this positivity is a neurophysiological sign of the memory-trace formation. In their subsequent study, Baldeweg et al. (2006) found that this positivity, and thus the MMN yielded by difference waves, was amplified by nicotine.

The phoneme traces discussed in the afore-going might also provide the sensory information needed in the production and control of the pronunciation (Näätänen, 2001). Trying to learn the correct pronunciation for a foreign speech sound could be described as an iterative process in which the goal is to produce the pronunciation that would match with the sensory information encoded in the trace of this target speech sound. Actually, in this exercise, one tries, presumably, by his or her own voice, to activate his or her own phoneme, syllable, and word traces by using this activation as feedback. Therefore, the accuracy of sensory information in these traces for foreign speech sounds probably sets the upper limit for the individual accuracy of pronouncing these sounds. Hence, in an attempt at improving pronunciation, one should first try to improve the informational accuracy of one’s own speech-sound traces.

9.4. The MMN and the “phonological mismatch negativity” (PMN)

When an attended linguistic input violates contextual expectations, for example, when the subject reads or hears a sentence “the pizza was too hot to sing”, a negative ERP response called the N400 is elicited by the incongruent word, the word “sing” in this example (Kutas and Hillyard, 1980, Kutas and Hillyard, 1984, Connolly and Phillips, 1994, Hagoort and Brown, 2000). Thus, the N400 appears to indicate linguistic processes activated by a semantic mismatch between the expected linguistic input and the perceived word. However, in the auditory modality, the N400 is preceded by an earlier negative response called the phonological mismatch negativity (PMN; Connolly and Phillips, 1994, Hagoort and Brown, 2000, Revonsuo et al., 1998, van den Brink et al., 2001). The PMN appears to be generated by a mismatch between the phonological structure of the expected word and that of the perceived words. For instance, a word that is not the most probable ending of a sentence but still ends it logically elicits a PMN (e.g., the last word of the sentence “when the power went out the house became quiet”, the word “dark” being the most expected ending). The distinction between the PMN and N400 is also supported by differences in their scalp topographies (Connolly et al., 2001, D’Arcy et al., 2004) and by the fact that isolated words and non-words in priming and phoneme-deletion tasks elicit the PMN (Connolly et al., 2001, Newman et al., 2003). In a recent MEG study, Kujala et al. (2004) found that the PMN was generated in the anterior parts of the temporal cortex in response to isolated spoken words and non-words that differed from the word or non-word mentally created by the subject in a rhyming task. However, no study so far has directly compared the PMN source loci with those of the MMN elicited by a phoneme change. However, the fact that attention to auditory input is needed for the PMN to be elicited strongly suggests that different auditory-cortex mechanisms underlie these responses.

10. The MMN for higher-order linguistic processes

10.1. The MMN as an index of mother-tongue syllable and word traces

Very importantly, with the MMN, one can also probe the memory representations of higher-order linguistic phenomena. MMN evidence for memory traces of mother-tongue syllables was reported by Shtyrov et al., 1998, Shtyrov et al., 2000 and Alho et al. (1998a), while similar evidence for morphemes was obtained by Shtyrov and Pulvermüller, 2002a, Shtyrov and Pulvermüller, 2002b and Shtyrov et al. (2005). Furthermore, Korpilahti et al. (2001), Pulvermüller et al. (2001a), and Shtyrov and Pulvermüller, 2002a, Shtyrov and Pulvermüller, 2002b were the first to show that even the memory traces of mother-tongue words can be probed with the MMN. For instance, Pulvermüller et al. (2001a), using Finnish subjects instructed to ignore sound stimuli and to watch a silent movie, found that the MMN to the same spoken Finnish syllable as a deviant stimulus was larger in amplitude when it ended a Finnish word than when it ended a pseudoword. This enhancement, reaching its maximum amplitude at about 150 ms from the word’s recognition point (Marslen-Wilson, 1987), did not occur in foreign subjects not knowing Finnish (Fig 9). In addition, the subsequent MMNm recordings showed that the major intracranial source of this word-related MMN was located in the left superior temporal lobe.

  1. Download : Download full-size image

Fig. 9. The MMNs elicited by the critical syllables /ki/ (A) and /ko/ (B) when placed in a word context (dark traces) and in a pseudo-word context (light). The acoustic waveforms of these syllables which elicited the MMNs are shown at the top. Data from native Finnish speakers are presented in the upper plots, those from foreigners appear at the bottom. A word-related MMN enhancement occurred in Finnish speakers but not in foreigners (from Pulvermüller et al., 2001a).

These results hence suggested that the MMN (MMNm) can reflect the activation of the permanent neuronal memory traces for mother-tongue words. The memory traces for words are likely to be realized as distributed, tightly connected populations of neurons (Pulvermüller, 1999, Pulvermüller, 2001, Shtyrov and Pulvermüller, 2002a, Shtyrov and Pulvermüller, 2002b) which become fully active, or “ignite,” when words are being processed (for a discussion, see Pulvermüller, 1999). In contrast, after the presentation of a pseudoword that does not occur in the usual language input, the ignition process would fail to emerge.

The relevant differences between the word and pseudo-word contexts emerged as early as at around 150 ms after the word recognition point. This is consistent with results (e.g., Pulvermüller et al., 1995; Skrandies, 1998) indicating that the neuronal counterparts of words become active early, within the first quarter of a second after the relevant information occurs in the input.

Pulvermüller (1999) points to the fact that if two syllables form a word, then they must frequently occur in succession, whereas if two syllables form a pseudo-word, then they may never, or only extremely rarely, occur in direct succession. Therefore, it might, according to him, be that the co-occurrence or correlation statistics of the two syllables included in the stimulus words and pseudo-words was the relevant factor reflected in the MMN amplitude, rather than the word or pseudo-word property of words per se. Only in the case of frequently occurring sequences, such as the phoneme and syllable sequences that represent words of a language, the CNS can build up a neuronal representation. Thus, if a syllable combination is a word, then it is necessarily characterized by an enhanced conditional probability of its component syllables. [See also Bonte et al. (2005), to be reviewed later.]

These results were corroborated by Shtyrov and Pulvermüller, 2002a, Shtyrov and Pulvermüller, 2002b, Pettigrew et al., 2004b, Pettigrew et al., 2005, Sittiprapaporn et al., 2003. Furthermore, Pulvermüller et al. (2003) found that the word-related MMNm enhancement originated from two left-hemispheric activation loci: one superior-temporal and the other inferior-frontal, peaking at 136 and 158 ms, respectively, from the word-recognition point. These areas are known to be crucial for language processing in most right-handed individuals (Zatorre et al., 1994, Price, 2000).

Evidence for the word-trace related MMN enhancement was also obtained by Pulvermüller et al. (2004). Very interestingly, they also found that this MMN enhancement exhibited differential topographies for the two individual Finnish words used. A bilateral, though left-hemispherically predominant, temporo-parieto-occipital set of generators accounted for the word lakko (strike), whereas the word lakki (cap) primarily activated widespread generators in the right hemisphere. These specific generators were activated by the words, in addition to the known sources in the perisylvian language areas activated by words and pseudowords alike (Pulvermüller et al., 2001b). Thus, generators located outside the left-hemispheric core language areas of Broca and Wernicke appeared to contribute to the word-related MMN enhancement.This is consistent with Pulvermüller’s (1999) suggestion that words are represented in the brain by distributed neuron webs linking phonological information mostly stored in the left-hemispheric language areas to semantic information involving additional cortical areas in both hemispheres.

The temporal dynamics of the MMN enhancement for words may also be of theoretical interest. Modular models of language processing postulate a staged or cascaded access to the different types of linguistic information in word perception and comprehension. Accordingly, a word would first be physically analyzed. Then, its phonological pattern would be extracted, which would be followed by lexical access, i.e., the lookup of the word form in a mental dictionary. Finally, semantic information associated with the word would be activated.

In contrast to these modular theories, Marslen-Wilson and Tyler (1975) suggested, on the basis of behavioural data, that phonological, lexical, and semantic information about a word is accessed near simultaneously as soon as the information in the sensory input allows for word identification. This view was supported by Pulvermüller et al. (2004). In their study, the ERP differences between the two individual critical syllables used, [ki] versus {ko], appeared in the same time interval as those between the word and pseudoword contexts, and these, in turn, were present simultaneously with the between-word differences in the MMN-enhancement topographies. Whereas physical stimulus analysis and linguistic processing may be serial (Assadollahi and Pulvermüller, 2001, Assadollahi and Pulvermüller, 2003, Rinne et al., 1999a), Pulvermüller et al.’s (2004) ERP data indicate that the linguistic processes of access to phonological and lexical, and possibly even to semantic, information about a word occur nearly simultaneously in the brain (see also Pulvermüller, 2001). The authors concluded that the activation of the memory traces for specific spoken words in the brain starts as early as at about 100–150 ms after the information in the input is sufficient for word recognition, i.e., the word-recognition point.

Further data indicating word- and word-category specific topographies of the word-related MMN enhancement were obtained by Shtyrov et al., 2004, Pulvermüller et al., 2005 who in an ignore condition recorded auditory ERP responses to movement-related English words. The MMN to action words disclosed an unusual centro-posterior distribution, suggesting that this activity was, at least in part, generated posteriorly to the usually observed frontal MMNs. Moreover, responses to the hand-related word deviant (pick) had a more widespread lateral distribution, whereas the leg-related deviant (kick) elicited a more focal dorsal negativity. These differences, remarkably reminiscent of the sensorimotor cortex topography, were explained in terms of distributed neuronal assemblies that function as category-specific memory traces for words and might involve sensorimotor cortical structures for encoding action words. The observed effects occurred early in time, suggesting that semantic processing may commence as early as at ∼140 ms after word onset.

Hence, it appeared that words are encoded in the brain by distributed neural networks encompassing cortical structures beyond the core language areas (Pulvermüller, 2001, Martin and Chao, 2001, see also Pulvermüller, 2005). This is also supported by studies in patients with specific semantic deficits, indicating that body-part knowledge is a distinct and dissociable semantic category that can be selectively preserved or impaired (see Coslett et al., 2002).

Furthermore, Endrass et al. (2004) found that bilateral redundant stimulus presentation resulted in a further enhancement of the MMN amplitude for words relative to both unilateral stimulus modes. This bilateral redundancy gain was absent for pseudowords. According to the authors, this was the first study to reveal a bilateral redundancy effect for spoken words. The authors stressed the fact that only learned stimuli, such as words or familiar faces, elicit a bilateral advantage in behavioural tasks and therefore suggested that the corresponding MMN enhancement can be explained within the cell-assembly theory. Accordingly, no summation effect could be obtained for complex unfamiliar stimuli, such as pseudowords or unfamiliar faces, as there is no corresponding long-term memory trace. In contrast, strong summation effects following redundant bilateral presentation can be expected for stimuli that are cortically represented by distributed neuronal ensembles with strong internal connections (Pulvermüller and Mohr, 1996, Mohr and Pulvermüller, 2002).

In a fully counter-balanced study using a cross-linguistic approach to control for stimulus effects, Jacobsen et al. (2004b) found an enhanced MMN to deviants presented in a language-familiar context, rather than a larger MMN to lexical than non-lexical deviants. Recently, they showed that this familiar- context effect is not limited to words, but rather appears to be grounded on more general mechanisms of processing familiar sounds (Jacobsen et al., 2005). Diesch et al. (1998), addressing the role of lexicality of the standard, however, found that the MMNm amplitude for non-word deviants was larger when standards were phonological non-words than when they were words. Moreover, for word standards, the MMNm-dipole location for non-word deviants was more lateral than when these deviants were presented against the background of non-word standards.

10.2. The MMN as an index of language laterality

MMN studies (e.g., Gootjes et al., 1999, Mathiak et al., 1999, Näätänen et al., 1997, Kasai et al., 2001) have also significantly contributed to the issue of the predominantly left-hemispheric lateralization of the language function (for reviews, see Hugdahl, 2002, Tervaniemi and Hugdahl, 2003) which is still not fully understood. The two main views are that laterality is best explained by (1) left cortical specialization for the processing of spectrally rich and rapidly changing sounds (Tallal et al., 1993, Zatorre and Belin, 2001), or (2) a predisposition of one hemisphere to develop a module for phonemes (Liberman and Whalen, 2000, Whalen and Liberman, 1987). Shtyrov et al. (2005) tested these two views in a passive oddball paradigm by recording the MMNm to the same brief acoustic stimulus as a deviant stimulus placed in contexts where it was perceived either as a noise burst with no resemblance to speech, or as a native-language sound /t/ being part of a meaningless pseudoword. In a further condition, the same acoustic element was placed in a word context. Results showed that left-laterality was maximal when the critical [t] stimulus was in word context and served its role as an inflectional affix conveying grammatical information. Laterality was significantly reduced when the [t] sound did not have a grammatical function, a finding not easily explained by the motor theory of speech perception or other purely phonological approaches to laterality. As laterality was entirely absent for the MMN elicited by the spectrally rich sound, these data also challenge physical/acoustic theories of laterality, although a null effect in an imaging experiment never constitutes strong evidence against a model. What Shtyrov et al.’s (2005) data strongly suggest is that, apart from physical and phonological factors, the left laterality of the brain response is also influenced by the grammatical and serial-order mechanisms.

Hence, these results demonstrate, according to the authors, that language laterality is bound to the processing of such brief sounds as units of frequently occurring meaningful items and can thus be linked to the processes of learning and memory-trace formation for such items rather than to their physical or phonological properties, i.e., it was the activation of these memory networks for the known items that produced the larger left-right asymmetry for words than non-words. The authors stressed that a prominent left-hemispheric dominance was seen only for stimuli familiar to the central nervous system which had previously developed memory networks for them: The same identical rapid complex sound did not produce significant laterality when presented in the context of meaningless items; this was regardless of their being speech/language or not. This rules out, according to the authors, an explanation of their data using the phonological approach to laterality (Liberman and Whalen, 2000).

In their previous study, Shtyrov et al. (1999) found that complex deviants with fast transitions not heard as speech sounds (when presented among standards of the same type) did not elicit a left-predominant but rather a bilateral MMNm (whereas an acoustically matched consonant–vowel syllable did). Furthermore, this bilateral MMNm became right-hemisperically predominant when the complex non-speech sounds (both standards and deviants) were made slower. Interestingly, the same MMNm shift towards right-hemispheric lateralization also occurred when speech sounds were presented in noise.

Recently, it was also shown that the association between a sound and a body action may play a role for laterality of the MMN: Hauk et al. (2006) found that click sounds produced with the tongue elicited a bilateral MMN, whereas a click sound produced by snipping the fingers gave rise to a left-lateralized MMN with a main source consistent with hand motor cortex. Interestingly, subjects reported that their preferred hand for finger snipping was the right, thus making it plausible that sound-motor associations contributed to laterality.

Rinne et al. (1999a), too, aimed at determining the stimulus prerequisites of the left lateralization of the MMN to speech sounds. The authors generated two continua of 8 auditory stimuli. One continuum ranged from a semisynthetic Finnish vowel /a/ (as in “but”) to the corresponding pure tone produced by passband filtering the vowel /a/. This tone was then step-by-step complemented by widening the filter until it became the original /a/. The other continuum, produced in an analogous way, had the Finnish /i/ at one end and the corresponding tone at the other end. Native Finnish speakers were presented with a repetitive stimulus (standard) which was taken from the /a/ continuum. Each of the 8 sounds served as the standard in separate conditions. In different conditions, this sound was infrequently replaced by the corresponding, equally complex sound from the /i/ continuum. It was found that the MMN became left-lateralized at the same time when the two stimuli were perceived as vowels.

Consistent with this, Tremblay et al. (1997) observed that while MMNs elicited by nonnative speech syllables were initially symmetric, they became enhanced in particular over the left hemisphere following training (see also Tremblay and Kraus, 2002). A similar shift to left-hemisphere predominance was also observed by Kujala et al. (2003) for deviant Morse patterns as a result of intensive Morse-code training. In addition, Sharma and Kraus (1995) found that the MMN elicited by the syllable /da/ was larger in amplitude over the left than right hemisphere when /da/ signaled a phonetic change but was symmetric when the same /da/ signaled a pitch change. Moreover, Zhang et al. (2000) found an MMNm-amplitude enhancement over the left hemisphere, and a parallel amplitude attenuation over the right hemisphere, as a Japanese subject learned to discriminate the English /la/ and /ra/. For further related studies, see Zhang et al., 2005, Koyama et al., 2000a, Koyama et al., 2000b, Koyama et al., 2000b, and Sittiprapaporn et al. (2003).

Similarly, Sharma et al.’s (1994) patient with a left temporal-lobe lesion was able normally to behaviourally discriminate a pitch change: yet his perception of the phonetic contrast with an equivalent pitch change was severely impaired. (See also Aaltonen et al., 1993.) Consistent with this, as already reviewed, the MMN was present when /da/ signaled a pitch contrast but was absent when it signaled a phonetic difference (for a review, see Kraus and Cheour, 2000).

Taken together, these results on the laterality of the MMN now indicate that a number of factors may contribute to the laterality. Physical factors can play a role as well as phonological ones, but, in order to obtain the full-fledged pattern of language laterality, it seems to be necessary that words with grammatical endings are being processed. This finding is in tune with earlier observations (Neville et al., 1992, Pulvermüller et al., 1995) that the laterality of brain responses is the strongest for grammatical words, so-called function (or closed-class) words, and suggests that, apart from the physical and phonological determinants of laterality, sequential processes and syntax play a role as well. As an additional factor, the association of sounds with motor actions appears to co-determine laterality. These results are in line with a memory-trace theory of the MMN. The laterality of the MMN would thus be explained in terms of lateralized memory traces for stimulus-feature sets, phonological units (speech sounds), sensorimotor processing (sound–action links), and grammatical processing (see also the paragraphs of grammar below).

10.3. The MMN as an index of grammar processing

Very interestingly, MMN data also provide evidence for the automatic processing of grammar. Pulvermüller and Shtyrov, 2003, Shtyrov et al., 2003 subjects watching a silent video film and ignoring speech stimuli were presented with grammatically incorrect sentences (with a probability of 16.7%) as rare deviant stimuli against ungrammatical word strings used as frequent standard stimuli. In the reverse design, the ungrammatical items were the rare deviants and the grammatical sentences the frequent standard stimuli. Whereas the brain responses to the frequently presented standard stimuli did not distinguish between grammatical and ungrammatical items, the MMN was significantly enhanced for the violations as compared with that elicited by correct sentences. This grammaticality effect had its main source in the left frontal cortex. This result indicates that the MMN mechanism is at work when these grammar effects are elicited. The authors related the syntactic MMN to differential activation of neuronal memory traces for grammatical word sequences (called “sequence detectors”, see Pulvermüller and Shtyrov, 2003, Pulvermüller, 2002). The study demonstrated, according to the authors, for the first time that the brain detects grammatical violations even when subjects are instructed to direct their attention away from the language input, i.e., that early syntax processing in the human brain may take place outside the focus of attention.

These results were confirmed and extended by Shtyrov et al. (2003) who found that occasional syntactically incorrect stimuli as deviant stimuli elicited larger-amplitude MMN responses than those elicited by correct phrases as deviant stimuli. This grammar-violation dependent MMN enhancement originated from the left temporal cortex, suggesting that this brain structure may play an important role in automatic grammar processing.

The syntactic MMN revealed by a number of studies (Menning et al., 2005; Pulvermüller and Shtyrov, 2003, Shtyrov et al., 2003; see also Gunter et al., 2000, Hahne and Friederici, 1999, Friederici et al., 1996, Friederici et al., 1999, Friederici et al., 2000) is an early left-anterior negativity similar to the ERP responses previously associated with the processing of grammatical violations (Neville et al., 1991, Friederici et al., 1993). These “early left-anterior negativity (ELAN)” responses can be separated from the late ERPs also elicited by word sequences that put strong demands on the grammatical processing system. In contrast to the early lateralized negativities, the late responses are not lateralized and require attention in order to be elicited (for discussion, see Pulvermüller and Shtyrov, 2006). Especially their lateness and lack of laterality bring up the question of whether they reflect genuine grammatical processes or rather late processes of reinterpreting an apparently incoherent word string. The earlier suggestion that only the late response might be sensitive to specific types of grammatical violations, especially violations of the agreement between the subject and the predicate, cannot be maintained, as early left-anterior negativities and syntactic MMNs could be elicited by this kind of violation (see also Münte et al., 1998).

Recently, Menning et al. (2005), corroborating the results of Shtyrov et al., 2003, Pulvermüller and Shtyrov, 2003, found that MMNm responses were considerably larger in amplitude for syntactic and semantic errors than those for mere phonemic deviations and, further, that the semantic errors elicited larger-amplitude MMNm responses than did the syntactic errors. The authors interpreted the error-sensitive MMN response as reflecting a very early detector for semantic and syntactic errors both occurring in the same time window. According to the authors, these results are of great interest in view of the fact that one of the most fundamental claims from linguistics is that semantics and syntax constitute separate types of information. The “syntax-first models” (e.g., Frazier and Clifton, 1996) often assume serial processing in that certain syntactic operations precede semantic processing, stressing that initial structure building based on syntactic principles is immune to semantic or pragmatic information (see Friederici et al., 1993). In contrast, constraint-based models claim the parallel processing of all information sources, and allow for interactions between them (see Marslen-Wilson and Tyler, 1975, Trueswell et al., 1993). Menning et al. (2005) concluded that their result evidently calls for a more parallel view of syntactic and semantic processing (Garnsey et al., 1997), suggesting that the MMN may reflect semantic violations, too, and at a considerably shorter latency than does the classical N400 wave (Kutas and Hillyard, 1980).

10.4. The MMN as an index of phonotactic probability

Very recently, Bonte et al. (2005) found that the MMN also reflects the phonotactic probability, i.e., the distributional frequency of the phoneme combinations, in the listener’s mother tongue. Subjects were presented with pairs of non-words that differed from one another by the degree of phonotactic probability. A modified passive oddball design minimizing the contribution of acoustic processes was used. It was found that a non-word with a high phonotactic probability (notsel) in Dutch elicited a larger-amplitude MMN than did a non-word with a low phonotactic probability (notkel). Thus, auditory cortical MMN responses to phoneme clusters appear to be modulated by the statistical regularities of phoneme combinations. This finding, of great interest in view of the fact that the neural correlates of the phonotactic probability are largely unexplored, may thus, according to the authors, reflect auditory cortical tuning to the distributional frequencies of the phoneme clusters in the language environment and may be related to the extensively reported behavioural finding that stimuli with a high phonotactic probability are easier to acquire, recognize, and memorize (see Auer and Luce, 2003).

Why would, then, a non-word with a high phonotactic probability lead to an enhanced MMN response? Bonte et al. (2005) referred to the Hebbian associative learning principles suggesting that frequently co-occurring events lead to the formation of neural memory representations where these events are together represented. Consistent with this, previous studies (e.g., Nelken, 2004) revealed neural changes in animal primary auditory cortex reflecting the distributional frequencies of simple acoustic features. “The present MMN findings suggest that the frequent exposure to certain phoneme sequences during development, i.e., those with a high phonotactic probability, may lead to enhanced auditory cortical responses and, possibly, to the formation of auditory cortical memory traces. Alternatively, our results may reflect a combination of experience-dependent phonological and basic acoustic influences related to universal principles of phonotactics. It is important to note that these two factors are not independent. Phoneme combinations that are perceptually more distinctive and/or easier to articulate tend to occur more frequently across languages and may thus have a higher phonotactic probability.” (Bonte et al., 2005, p. 2773).

For a recent review on the contribution of the MMN (MMNm) to our understanding of the brain mechanisms of language processing, (see Pulvermüller and Shtyrov, 2006; see also Osterhout et al., 1997).

10.5. The MMN in service of voice discrimination/identification

A novel application of the MMN was proposed by Titova and Näätänen (2001) who used the MMN as an index of voice similarity/dissimilarity. They presented a female voice as a standard stimulus whereas deviant stimuli included one male voice and three female voices. Significant positive correlations were established between the MMN amplitude and the dissimilarity ratings. Moreover, the MMN amplitude proved to be a more reliable indicator of voice identity than the behavioural dissimilarity ratings. Furthermore, referring to the fact that no MMN is elicited when a deviant is identical to the standard, the authors proposed the use of the MMN as a tool of speaker-identity determination. (In fact, they had a fifth “deviant”, one identical to the standard, which of course elicited no MMN.)

10.6. The MMN as an index of voice familiarity

Very recently, Beauchemin et al. (2006) found that the MMN and P3a are enhanced in amplitude for familiar-voice deviants relative to those for unfamiliar-voice deviants. Further, this enhancement was not observed in another group of subjects for whom neither deviant voice was familiar. Their standard was an /a/ (as in the word ‘allo’, the French word for ‘hello’). In addition, there were two deviants in the same stimulus block, this vowel being uttered either by a familiar voice (the subject’s friend or relative) or an unfamiliar voice. The authors concluded that these findings tentatively suggest that “specialized areas for voice processing are especially tuned to familiar voices as opposed to unfamiliar voices” (Beauchemin et al., 2006, p. 3085) and, further, that there is some degree of preattentive voice-familiarity evaluation modulating behavioural discrimination. Hence, this study showed that with the MMN, one can, in addition to the long-term speech-sound memory traces, also probe, and hence demonstrate the presence of, merely acoustic long-term memory traces.

11. The MMN and musical stimuli

Studies using the MMN have also significantly contributed to our understanding of music perception and enjoyment. In their recent review, Tervaniemi and Brattico (2004) concluded that by using the MMN, we can look into the separate submodules of music perception “examining with optimal time resolution the dynamic stages of information processing, as well as their automaticity and possible top-down modulation (which may be determined, for example, by implicit or explicit knowledge of musical sounds)” (p. 15). One important basic finding was that of Tervaniemi et al., 1997b, Toiviainen et al., 1998, Goydke et al., 2004, Caclin et al., 2006 showing that the MMN is elicited by a timbre change. This and related findings highlight, according to Tervaniemi and Brattico (2004), the ability of the auditory cortex to differentiate sounds according to a multidimensional sound attribute such as timbre which in acoustical terms is quite complex but which forms a common cue in our daily auditory scene to differentiate sounds even with the same spatial origin and time course (see, e.g., Bregman, 1990).

Moreover, Tervaniemi et al. (2000b) found that the MMN elicited by a pitch change among harmonically rich sounds is larger in amplitude and earlier in latency than that elicited by pure sinusoidal tones and, further, that the behavioural discrimination of pitch change was more accurate when sounds were harmonically rich than when they were pure tones. Thus, adding acoustical information into sound signal seems to help its neural encoding rather than delaying or complicating it.

In addition, as already mentioned, several studies (e.g., Tervaniemi et al., 1994a, Tervaniemi et al., 2001; for a review, see Näätänen et al., 2001) showed that the memory system reflected by the MMN encodes sound information far beyond the acoustic sound properties. Such findings, to be reviewed in detail in the next chapter, indicate that the auditory cortex automatically encodes, irrespective of whether the subject is listening or not, relatively invariant, abstract, sound information such as melodic, harmonic, and rhythmic auditory information which music is typically composed of. The processing of musical stimuli usually occurs with right-hemispheric weighting, as indicated by MEG (Tervaniemi et al., 1999, Maess et al., 2001), PET (Tervaniemi et al., 2000a), and fMRI studies (see Koelsch and Siebel, 2005).

Of particular importance is the sensitivity of the MMN to changes in the contour of acoustic stimulation as contours are a central element of musical stimulation. As already mentioned, Tervaniemi et al. (2001) found, in an ignore condition, an MMN to changes in the contour of transposed melody-like tone patterns. This MMN, however, was observed in musical subjects only (after subjects had tried to detect the deviant patterns in an active discrimination condition). These musical subjects were musicians who could play their instruments with no score.

Several other studies, too, found the superiority of musicians to non-musicians in processing musical sound material. For example, Vuust et al. (2005) observed that expert jazz musicians had a larger and earlier MMNm response to subtle deviations in rhythm than that of musically inept non-musicians. Furthermore, the musicians’ MMNm for these deviations was left-lateralized, whereas the (smaller) MMNm of the non-musicians was right-lateralized. The authors suggested that this left-lateralization reflects the functional adaptation of their brain to a task of communication which in these musicians is much like that of language when they play jazz together, with subtle rhythmic deviations forming signals of musical communication for them.

Trainor et al. (2002), however, found that even non-musicians’ brains automatically process melodic information (see also Koelsch et al., 2000). The authors observed an MMN to changes both in contour (the up–down patterns of pitch change) and interval (the exact pitch distances between notes) in the absence of absolute frequency information, stressing the fact that of these two forms of information, precise interval processing is specific to music (and is greatly affected by musical training), whereas contour information is important in both musical and speech domains. Subsequently, Fujioka et al. (2004) found that in musicians, both contour- and interval-change MMNms are larger in amplitude than those in non-musicians, however. Importantly, the MMNm elicited by a simple frequency change was of very similar amplitude in the two groups. This suggests that the MMN (MMNm) enhancement due to musical training/talent might be confined to musical stimuli only (being more of abstract rather than sensory nature; see also Tervaniemi and Brattico, 2004). (There were no differences in the hemispheric distribution, however.) Consistent with this, the authors concluded that musical training mainly affects the pitch contour and interval relations between tones rather than the encoding of single tones.

For further MMN studies showing superior auditory preattentive processing in musicians, see Fujioka et al., 2005, Koelsch et al., 1999, Koelsch et al., 2002, Lopez et al., 2003, Rüsseler et al., 2001, Tervaniemi et al., 1997a, Tervaniemi et al., 2005b, van Zuijen et al., 2004, van Zuijen et al., 2005, Brattico et al., 2002a, Brattico et al., 2002b. On the basis of their results, Koelsch et al. (1999) concluded that “the superior discrimination performance of musicians is not only due to processing at higher cognitive levels but also to pre-attentive memory-based processing. Contrary to phoneme processing, this superior automatic discrimination is most probably not due to long-term stored representations (i.e. permanent sensory memory traces), but due to an elaborated mechanism of information acquirement underlying the generation of MMN” (p. 1313).

The MMN has also been used to investigate the processing of musical syntax (e.g., Koelsch et al., 2003a, Koelsch et al., 2003b, Loui et al., 2005, Leino et al., 2007). In these studies, chords with an irregular harmonic function presented within sequences of chords elicited an early anterior negativity which has also been denoted as music-syntactic MMN (Koelsch et al., 2003a, Koelsch et al., 2003b).

Recently, the MMN has also been used to assess music perception in cochlear-implant users in whom a timbre MMN, though one smaller in amplitude than that in controls, can be elicited (Koelsch et al., 2004).

For recent reviews on the import of the MMN in studies of music perception, see Tervaniemi and Brattico, 2004, Münte et al., 2002, Koelsch and Siebel, 2005.

12. Abstract-feature MMNs

As already reviewed, the pre-attentive auditory analysis reflected by the MMN is not restricted only to physical, or “first-order”, stimulus features but rather includes even more complex invariances, ones based on the relationships between various physical stimulus features, either within individual stimuli or between successive stimuli (e.g., Paavilainen et al., 1999). In these so-called “abstract-feature” MMN studies, there is no physically identical, repetitive standard stimulus but rather a class of several physically different “standard” stimuli. The invariant, “abstract”, feature uniting the various exemplars of the standard stimuli is based on some common rule that they all obey.

In a pioneering study, Saarinen et al. (1992) presented their subjects with tone pairs (two 60-ms tone pips separated by a 40-ms silent gap; silent inter-pair interval 640 ms). The position of the tone pairs in the frequency scale randomly varied over a wide range, there being no physically identical, repetitive standard stimulus. Instead, the constant feature of the standard pairs was an “abstract” or “second-order” one, namely, the direction of the frequency change within a tone pair: all the standard pairs were ascending pairs (i.e., the second tone of a pair was higher in frequency than the first tone), whereas the deviant pairs were descending pairs. Thus, the abstract attribute was based on a rule defining the relationship between the simple physical, first-order, attributes of the two tones forming a pair. Nevertheless, the MMN was elicited by the deviant pairs in an ignore condition. This result showed that the preattentively formed sensory representations were capable of encoding the abstract attributes corresponding to simple concepts (“rise”, “fall”), that is, of deriving a common invariant feature from a set of individual varying physical events. For similar results in children aged 8–14 years, see Gumenyuk et al. (2003)! Furthermore, Korzyukov et al. (2003), using a similar paradigm, localized the source of the abstract-feature MMN with EEG and MEG recordings at the auditory cortex. (For studies corroborating and extending these findings, see Paavilainen et al., 1995, Paavilainen et al., 1998). For studies showing abstract-feature MMNs in newborns, see Ruusuvirta et al., 2003, Ruusuvirta et al., 2004 and Carral et al. (2005).

As already reviewed, in addition to the direction of frequency and intensity change, even invariant frequency ratios (musical intervals) can be automatically derived from acoustically varying stimulation (Paavilainen et al., 1999).

What might be the ecological validity of these kinds of studies? The extraction of the invariant relationships from physically varying auditory stimulation is of critical importance to higher perceptual-cognitive functions such as the processing of speech and music. For example, as already reviewed, one usually categorizes phonemes correctly irrespective of considerable variation in the physical “surface” features of the speech signal resulting from the acoustically different voices of the various speakers (e.g., male or female) and the word context. Shestakova et al.’s (2002b) MEG study reviewed in the afore-going clearly demonstrated the existence and operation of such category memory traces that can recognize the corresponding phonemes even in the presence of wide acoustical variation.

Similarly, in music, we recognize melodies irrespective of the key into which they are transposed or of the instrument with which they are played. In their afore-mentioned study, Tervaniemi et al.’s (2001) posed standard stimuli consisted of a melodic pattern that was randomly presented at different frequency levels (simulating a melody randomly played in different keys). Nevertheless, occasional slight contour changes in the patterns widely varying in the frequency level elicited an MMN. This MMN was especially prominent in musicians who perform music primarily without a score. Even in these subjects, some training and attending to auditory stimuli was, however, required for the MMN being elicited (in a later stimulus block). After learning, their auditory cortex hence detected contour changes even when attention was directed away from the sounds.

Consequently, MMN studies can reveal neural mechanisms underlying perceptual invariances essential for higher-level auditory processing. In a further study, Paavilainen et al. (2003a) wished to determine whether the preattentive sound-analysis mechanisms are capable of extracting invariant relationships based on abstract conjunctions between two sound features. Their stimuli were sinusoidal tone pips presented with a silent ISI of 400 ms. The stimuli randomly varied over a large range in two feature dimensions (frequency and intensity), there being neither a physically constant, repetitive standard stimulus nor a physically constant, repetitive feature conjunction. However, a constant relationship between the frequency and intensity of the various exemplars of the standard stimuli was defined by a linear, abstract conjunction rule “the higher the frequency of a stimulus, the louder its intensity”. Subjects ignoring sound stimuli were presented with occasional deviant stimuli that violated this regularity, obeying the opposite rule (for example, an occasional high-pitch tone with a weak intensity). An MMN was elicited even by such deviant stimuli, demonstrating that the preattentive processing of auditory stimuli extends to complex relationships between the different stimulus features.

In a separate, active condition, subjects were asked to press a button to any stimuli that they somehow felt “deviant”. The rule separating the deviant stimuli from the standard stimuli was not told to them prior to this condition. Although subjects detected deviants fairly well, in the subsequent interviews, they usually were not able to explain why they pressed a button to some of the stimuli. Thus, the information extracted by the MMN mechanism was utilized in the active detection condition, but this information appears to have been in an implicit form, difficult to express verbally. Consequently, the present findings might also be relevant to the controversial issue of implicit memory and learning (see, e.g., Shanks and St. John, 1994).

Furthermore, two studies demonstrated that the MMN mechanism also forms extrapolatory traces representing the forthcoming stimuli on the basis of the regularities or trends detected in the auditory past. In the afore-reviewed Tervaniemi et al. (1994a) study (Fig. 2), stimuli consisted of a long sequence of steadily descending tones, i.e., ones following the rule that each tone is lower in frequency than the previous one. It was found that occasional deviant stimuli (an ascending tone or a tone repetition) elicited an MMN. It is noteworthy that, again, all standard stimuli were physically different, and the deviant events were composed of physically similar stimuli that had occurred in the immediate auditory past.

In another study along the same lines, Paavilainen et al. (2007) used sound stimuli that varied in two features, duration and frequency. The stimuli were either short (50 ms) or long (150 ms), and low (1000 Hz) or high (1500 Hz). All combinations (short-low, short-high, long-low, long-high) were presented at p = .25 with an ISI of 300 ms. The duration of each stimulus randomly was either short or long. The stimulus sequences were constructed so that the duration of each stimulus predicted the frequency of the next stimulus so that: (1) if the present stimulus is short in duration, then the next stimulus will be low in frequency; and (2) if the present stimulus is long in duration, then the next stimulus will be high in frequency. Occasional deviant events broke these rules: for example, a high-pitched stimulus following a short stimulus. In this design, all the 4 different stimulus combinations used could appear either as a standard or a deviant event, depending on the duration of the preceding stimulus. Nevertheless, the deviant events elicited, in an ignore condition, an MMN, one peaking at 150–200 ms and reversing its polarity at the mastoids, which suggested a source in the auditory cortex.

In the subsequent attend conditions, subjects were asked to press a button to any stimuli they found somehow “strange” or “deviant”. (The rules were not explained to them prior to the task). An MMN was again elicited, although subjects could detect only about 15 % of the deviant events, and none of them could verbally express the rules in the later interviews. The results suggest that the neural mechanism modeling the auditory environment may automatically “learn” the co-variation between the features of the successive events and make predictions of the properties of the forthcoming stimuli. Further, if the predictions are not fulfilled, then the MMN is generated.

In conclusion, the afore-reviewed MMN studies suggest that the central auditory system performs surprising cognitive operations, such as generalization leading to simple concept formation, rule extraction, and the anticipation of the next stimulus even at the pre-attentive level, demonstrating a kind of “primitive sensory intelligence” in the auditory cortex (for a review, see Näätänen et al., 2001). The information extracted by the sensory-memory mechanisms often seems to be in an implicit form, not directly available to conscious processes and difficult to express verbally. The present results hence are consistent with the framework suggested by Winkler et al., 1996a, Winkler et al., 1996b, according to which the main function of the MMN process is to adjust a neural model to the various regularities of the auditory environment, enabling the central auditory system to manage a large part of its subsequent input automatically, i.e., without requiring the limited resources of the controlled-processing system.

13. Concluding discussion

In the present article, we have reviewed the main areas of the basic research of cognitive brain function using the MMN since the late 1970s when the MMN was described and interpreted. In the next, we will list the principal trends of this research during this period of almost 30 years.

Acknowledgements

The authors wish to thank Dr. Friedeman Pulvermüller, Dr. Thomas Jacobsen, and Dr. Rika Takegata for their valuable comments and additions to a the previous version of the present manuscript and Ms. Piiu Lehmus for her competent and patient text-editing work.

References

View Abstract